Hash Functions
From charlesreid1
Hash functions have two principal applications: cryptographic hash functions, and non-cryptographic hash functions.
Cryptographic hash functions must have a much lower probability of collision, since a collision could result in identity theft or forged documents.
Non-cryptographic hash functions are mainly used for mapping keys to values, so a collision is an inconvenience rather than a problem.
Non-Crypto Hash Functions
Trivial Hash Functions
Trivial hash functions use the data itself as the hash value. This works if the data are small in value or limited in length. (Example: hashing integers below 100. Can just use the integers themselves as their own index in the hash collection.)
Checksums
For longer sequences of data, be they strings or bytes, the approach is to break the sequence into small parts, and to make the output dependent on each piece, and each piece dependent on the others. This can be done with an iterative algorithm that maintains a "state" (integer or set of integers) modified by each chunk of the stream.
For example:
# Start by initializing the state
S = S0
# For each piece of data,
for k in range(1,m):
# fold the current chunk of data into the state
S = modifyHash(S, b[k])
# Extract the hash value from the state
return returnHash(S, n)
Perfect Hashing
Perfect hash functions, as indicated by the term "perfect," fulfill everything that hash functions must do. Specifically, perfect hash functions are injective and map each valid input to a different valid output.
(If the map were a function, we would say the function is a one-to-one function.)
Minimal Perfect Hashing
A minimal perfect hash maps every valid input to a different valid output, but do so in a way that requires minimal space, so that all of the outputs are consecutive.
A minimal perfect hash is very challenging to find.
Avoiding Collisions
If you're using a simpler hash function, it is possible to determine the effect that similar inputs will have on the output. For example, if hashing phone numbers, most of the leading digits will be similar. Any method that depends heavily on the leading digits is likely to lead to higher collisions.
Therefore, the distribution of data can have an affect on the hash function and make different hash functions better or worse choices. Ultimately it is important to test your hash function of interest on your data set of interest.
Flags
| Maps and Dictionaries Part of Computer Science Notes
Series on Data Structures
Maps/Dictionaries Maps · Maps/ADT · Maps in Java · Maps/OOP · Maps/Operations and Performance Map implementations: Maps/AbstractMap · Maps/UnsortedArrayMap · Maps/SortedArrayMap Dictionary implementations: Dictionaries/LinkedDict · Dictionaries/ArrayDict
Hashes Hash Maps/OOP · Hash Maps/Operations and Performance Hash Maps/Dynamic Resizing · Hash Maps/Collision Handling with Chaining Hash functions: Hash Functions · Hash Functions/Cyclic Permutation Hash map implementations: Hash Maps/AbstractHashMap · Hash Maps/ChainedHashMap
Skip Lists · Java/ConcurrentSkipList · Java implementations: SkipList
Sets Sets · Sets/ADT · Sets in Java · Sets/OOP · Multisets
|