Hash Tables Study Guide: Difference between revisions
From charlesreid1
(Fix multiple accuracy errors: dictionary problem definition, open addressing load factor, double hashing formula, table header/labels, spelling, and worst-case complexity (via update-page on MediaWiki MCP Server)) |
|||
| (11 intermediate revisions by the same user not shown) | |||
| Line 5: | Line 5: | ||
'''hash table''' - data structure that uses key or value based lookup system | '''hash table''' - data structure that uses key or value based lookup system | ||
'''dictionary problem''' - | '''dictionary problem''' - the classic problem of designing efficient data structures that implement associative arrays (supporting insert, delete, and search operations) | ||
'''hash function''' - a mapping from input objects to a hash table entry | '''hash function''' - a mapping from input objects to a hash table entry | ||
| Line 47: | Line 47: | ||
'''load factor''' - alpha - ratio of number of items N to number of buckets M, cost is proportional to load factor | '''load factor''' - alpha - ratio of number of items N to number of buckets M, cost is proportional to load factor | ||
'''open addressing''' - requires load factor | '''open addressing''' - requires load factor < 1 (the table must not be completely full, or probing may fail to terminate); stores elements directly instead of in buckets | ||
'''linear probing''' - used with open addressing, collision handling technique in which a full element leads to a neighbor search (deletion becomes tricky, mark removed | '''linear probing''' - used with open addressing, collision handling technique in which a full element leads to a neighbor search (deletion becomes tricky, mark removed slots) | ||
'''quadratic probing''' - alternative that provides more desirable properties than | '''quadratic probing''' - alternative that provides more desirable properties than linear probing | ||
'''double hashing''' - if we hash a key h(k) and the position is occupied, we can use a second hash function to find a new key | '''double hashing''' - if we hash a key h(k) and the position is occupied, we can use a second hash function to find a new key/position | ||
<math> | <math> | ||
h(k) + i \cdot h^{\prime}(k) | (h(k) + i \cdot h^{\prime}(k)) \mod N | ||
</math> | </math> | ||
| Line 64: | Line 64: | ||
Hash Tables provide the following methods: | Hash Tables provide the following methods: | ||
* hashFunction (turns key into index 0 to N-1) | |||
* get | * get | ||
* set | * set | ||
* | * del | ||
* getBucket (deletes an item from a particular bucket) | |||
* setBucket (sets the value of an item in a particular bucket) | |||
* delBucket (removes an item from a particular bucket) | |||
* size | * size | ||
* empty | * empty | ||
| Line 75: | Line 79: | ||
=Implementations= | =Implementations= | ||
Hash Table | Hash Table fields/members: | ||
* table for data | * table for data | ||
* number of entries | * number of entries | ||
* | * hash function coefficients | ||
** prime p | ** prime p | ||
** scale a | ** scale a | ||
** shift b | ** shift b | ||
Hash Table methods: | |||
* hash function (universal/MAD function) | * hash function (universal/MAD function) | ||
* get/set/ | * get/set/del methods | ||
* getBucket/setBucket/delBucket methods | |||
* resize method | * resize method | ||
Chaining Hash Table methods: | |||
* overrides getBucket/setBucket/delBucket to use "chained" linked lists for buckets | |||
Probing Hash Table methods: | |||
* overrides getBucket/setBucket/delBucket to use linear/quadratic probing to find open slots | |||
* implements non-empty AVAILABLE sentinel | |||
=Algorithms for Operations= | =Algorithms for Operations= | ||
| Line 91: | Line 105: | ||
get(k) method: | get(k) method: | ||
* j = | * j = hashFunction(k) | ||
* | * return getBucket(j,k) | ||
set(k,v) method: | set(k,v) method: | ||
* j = | * j = hashFunction(k) | ||
* | * setBucket(j,k,v) | ||
* update size | * update size | ||
delete(k) method: | delete(k) method: | ||
* j = | * j = hashFunction(k) | ||
* | * delBucket(j,k) | ||
* update size | * update size | ||
get, set, and delete methods are all public | |||
hashFunction method varies depending on hash function chosen | |||
getBucket, setBucket, delBucket are private and depend on the concrete implementation | |||
==Chained Hash Map== | ==Chained Hash Map== | ||
get bucket(j,k) method: | get bucket(j,k) method: | ||
* | * bucket_j = buckets[j] | ||
* return | * return bucket_j[k] | ||
set bucket(j,k,v) | set bucket(j,k,v) | ||
* | * bucket_j = buckets[j] | ||
* | * bucket_j[k] = v | ||
remove bucket(j,k) method: | remove bucket(j,k) method: | ||
* | * bucket_j = buckets[j] | ||
* | * del bucket_j[k] | ||
==Linear Probing Hash Table== | ==Linear Probing Hash Table== | ||
findBucketSlot(j,k) method: | |||
* Find key k in bucket j | * Find key k in bucket j | ||
* if true, found match, returning location | * if true, found match, returning location | ||
* if false, not match, returning 1st available slot | * if false, not match, returning 1st available slot | ||
getBucket(j,k) method: | |||
* | * findBucketSlot(j,k) | ||
* if found: | * if found: | ||
** return table value | ** return table value | ||
setBucket(j,k,v) method: | |||
* | * findBucketSlot(j,k) | ||
* if found: | * if found: | ||
** | ** bucket_j[k] = v | ||
* else: | * else: | ||
** new item(k,v) | ** new item(k,v) | ||
** | ** bucket_j insert new item | ||
** update size | ** update size | ||
removeBucket(j,k) method: | |||
* | * findBucketSlot(j,k) | ||
* if found: | * if found: | ||
** mark as vacated using AVAIL constant | ** mark as vacated using AVAIL constant | ||
| Line 155: | Line 166: | ||
=Complexity and Cost= | =Complexity and Cost= | ||
Note: alpha is the loading factor, N/M, where N is the | Note: alpha is the loading factor, N/M, where N is the number of elements | ||
in the hash table, and M is the number of buckets. | in the hash table, and M is the number of buckets. | ||
{| class="wikitable" cellpadding="100" width="66%" | {| class="wikitable" cellpadding="100" width="66%" | ||
!colspan="4"|Big-O Complexity of | !colspan="4"|Big-O Complexity of Hash Tables | ||
|- | |- | ||
| | | | ||
|<br /><br /> | |<br /><br /> | ||
Hash Table (expected) | |||
|<br /><br /> | |<br /><br /> | ||
Hash Table (worst) | |||
|<br /><br /> | |<br /><br /> | ||
Chained Hash Map | Chained Hash Map | ||
| Line 190: | Line 201: | ||
|size | |size | ||
|O(1) | |O(1) | ||
|O( | |O(1) | ||
|O(1) | |O(1) | ||
| Line 196: | Line 207: | ||
|empty | |empty | ||
|O(1) | |O(1) | ||
|O( | |O(1) | ||
|O(1) | |O(1) | ||
Latest revision as of 03:56, 8 June 2026
Definitions and Variations
Definitions
hash table - data structure that uses key or value based lookup system
dictionary problem - the classic problem of designing efficient data structures that implement associative arrays (supporting insert, delete, and search operations)
hash function - a mapping from input objects to a hash table entry
bucket array - array of data used to store hash table entries, indexed by hash codes (key space)
collision - when two or more elements that are distinct/different have the same hash code
compression function - maps the output of the hash code to an integer in the range 0 to N-1
hash code - an integer that is uniquely determined by the input key
polynomial hash code - accounts for positions (not just presence) of various bits, each bit is the coefficient of a polynomial (use Horner's Rule to evaluate):
$ x_0 a^{n-1} + x_1 a^{n-2} + \dots + x_{n-2} a + x_{n-1} $
cyclic shift hash code - shifts the bits of the 32-bit key representation by X bits (left or right)
immutable - unable to be changed after construction/instantiation; keys must be immutable, or their hash value will change!
division method (compression) - turns the hash code into an integer between 0 and N-1 via the rather unimaginative
$ i \mod N $
MAD (multiply and divide) method (a.k.a. universal hashing) (compression) - use the following compression function:
$ (ai + b \mod p ) \mod N $
where p > N and p is prime
collision resolution - rule used to determine what happens when a collision is found
separate chaining - using linked lists as buckets, instead of storing elements in an array directly
load factor - alpha - ratio of number of items N to number of buckets M, cost is proportional to load factor
open addressing - requires load factor < 1 (the table must not be completely full, or probing may fail to terminate); stores elements directly instead of in buckets
linear probing - used with open addressing, collision handling technique in which a full element leads to a neighbor search (deletion becomes tricky, mark removed slots)
quadratic probing - alternative that provides more desirable properties than linear probing
double hashing - if we hash a key h(k) and the position is occupied, we can use a second hash function to find a new key/position
$ (h(k) + i \cdot h^{\prime}(k)) \mod N $
ADTs and Interfaces
Hash Table ADT
Hash Tables provide the following methods:
- hashFunction (turns key into index 0 to N-1)
- get
- set
- del
- getBucket (deletes an item from a particular bucket)
- setBucket (sets the value of an item in a particular bucket)
- delBucket (removes an item from a particular bucket)
- size
- empty
- clear
- clone
- iterators, etc.
Implementations
Hash Table fields/members:
- table for data
- number of entries
- hash function coefficients
- prime p
- scale a
- shift b
Hash Table methods:
- hash function (universal/MAD function)
- get/set/del methods
- getBucket/setBucket/delBucket methods
- resize method
Chaining Hash Table methods:
- overrides getBucket/setBucket/delBucket to use "chained" linked lists for buckets
Probing Hash Table methods:
- overrides getBucket/setBucket/delBucket to use linear/quadratic probing to find open slots
- implements non-empty AVAILABLE sentinel
Algorithms for Operations
Hash Map base class
get(k) method:
- j = hashFunction(k)
- return getBucket(j,k)
set(k,v) method:
- j = hashFunction(k)
- setBucket(j,k,v)
- update size
delete(k) method:
- j = hashFunction(k)
- delBucket(j,k)
- update size
get, set, and delete methods are all public
hashFunction method varies depending on hash function chosen
getBucket, setBucket, delBucket are private and depend on the concrete implementation
Chained Hash Map
get bucket(j,k) method:
- bucket_j = buckets[j]
- return bucket_j[k]
set bucket(j,k,v)
- bucket_j = buckets[j]
- bucket_j[k] = v
remove bucket(j,k) method:
- bucket_j = buckets[j]
- del bucket_j[k]
Linear Probing Hash Table
findBucketSlot(j,k) method:
- Find key k in bucket j
- if true, found match, returning location
- if false, not match, returning 1st available slot
getBucket(j,k) method:
- findBucketSlot(j,k)
- if found:
- return table value
setBucket(j,k,v) method:
- findBucketSlot(j,k)
- if found:
- bucket_j[k] = v
- else:
- new item(k,v)
- bucket_j insert new item
- update size
removeBucket(j,k) method:
- findBucketSlot(j,k)
- if found:
- mark as vacated using AVAIL constant
Complexity and Cost
Note: alpha is the loading factor, N/M, where N is the number of elements in the hash table, and M is the number of buckets.
| Big-O Complexity of Hash Tables | |||
|---|---|---|---|
Hash Table (expected) |
Hash Table (worst) |
Chained Hash Map | |
| get | O(1) | O(n) | O(alpha) |
| set | O(1) | O(n) | O(alpha) |
| remove | O(1) | O(n) | O(alpha) |
| size | O(1) | O(1) | O(1) |
| empty | O(1) | O(1) | O(1) |
| iteration over structure | O(n) | O(n) | O(n) |
OOP Principles
Object hash codes
- Java and Python both provide methods for modifying/directly implementing a custom object hash code method
Private utility methods:
- Pack your functionality into general, protected utility methods, as with chained probing hash table implementations
Flags
| Maps and Dictionaries Part of Computer Science Notes
Series on Data Structures
Maps/Dictionaries Maps · Maps/ADT · Maps in Java · Maps/OOP · Maps/Operations and Performance Map implementations: Maps/AbstractMap · Maps/UnsortedArrayMap · Maps/SortedArrayMap Dictionary implementations: Dictionaries/LinkedDict · Dictionaries/ArrayDict
Hashes Hash Maps/OOP · Hash Maps/Operations and Performance Hash Maps/Dynamic Resizing · Hash Maps/Collision Handling with Chaining Hash functions: Hash Functions · Hash Functions/Cyclic Permutation Hash map implementations: Hash Maps/AbstractHashMap · Hash Maps/ChainedHashMap
Skip Lists · Java/ConcurrentSkipList · Java implementations: SkipList
Sets Sets · Sets/ADT · Sets in Java · Sets/OOP · Multisets
|