Latest revision as of 03:56, 8 June 2026

Definitions and Variations

Definitions

hash table - data structure that uses key or value based lookup system

dictionary problem - the classic problem of designing efficient data structures that implement associative arrays (supporting insert, delete, and search operations)

hash function - a mapping from input objects to a hash table entry

bucket array - array of data used to store hash table entries, indexed by hash codes (key space)

collision - when two or more elements that are distinct/different have the same hash code

compression function - maps the output of the hash code to an integer in the range 0 to N-1

hash code - an integer that is uniquely determined by the input key

polynomial hash code - accounts for positions (not just presence) of various bits, each bit is the coefficient of a polynomial (use Horner's Rule to evaluate):

$x_0 a^{n-1} + x_1 a^{n-2} + \dots + x_{n-2} a + x_{n-1}$

cyclic shift hash code - shifts the bits of the 32-bit key representation by X bits (left or right)

immutable - unable to be changed after construction/instantiation; keys must be immutable, or their hash value will change!

division method (compression) - turns the hash code into an integer between 0 and N-1 via the rather unimaginative

$i \mod N$

MAD (multiply and divide) method (a.k.a. universal hashing) (compression) - use the following compression function:

$(ai + b \mod p ) \mod N$

where p > N and p is prime

collision resolution - rule used to determine what happens when a collision is found

separate chaining - using linked lists as buckets, instead of storing elements in an array directly

load factor - alpha - ratio of number of items N to number of buckets M, cost is proportional to load factor

open addressing - requires load factor < 1 (the table must not be completely full, or probing may fail to terminate); stores elements directly instead of in buckets

linear probing - used with open addressing, collision handling technique in which a full element leads to a neighbor search (deletion becomes tricky, mark removed slots)

quadratic probing - alternative that provides more desirable properties than linear probing

double hashing - if we hash a key h(k) and the position is occupied, we can use a second hash function to find a new key/position

$(h(k) + i \cdot h^{\prime}(k)) \mod N$

ADTs and Interfaces

Hash Table ADT

Hash Tables provide the following methods:

hashFunction (turns key into index 0 to N-1)
get
set
del
getBucket (deletes an item from a particular bucket)
setBucket (sets the value of an item in a particular bucket)
delBucket (removes an item from a particular bucket)
size
empty
clear
clone
iterators, etc.

Implementations

Hash Table fields/members:

table for data
number of entries
hash function coefficients
- prime p
- scale a
- shift b

Hash Table methods:

hash function (universal/MAD function)
get/set/del methods
getBucket/setBucket/delBucket methods
resize method

Chaining Hash Table methods:

overrides getBucket/setBucket/delBucket to use "chained" linked lists for buckets

Probing Hash Table methods:

overrides getBucket/setBucket/delBucket to use linear/quadratic probing to find open slots
implements non-empty AVAILABLE sentinel

Algorithms for Operations

Hash Map base class

get(k) method:

j = hashFunction(k)
return getBucket(j,k)

set(k,v) method:

j = hashFunction(k)
setBucket(j,k,v)
update size

delete(k) method:

j = hashFunction(k)
delBucket(j,k)
update size

get, set, and delete methods are all public

hashFunction method varies depending on hash function chosen

getBucket, setBucket, delBucket are private and depend on the concrete implementation

Chained Hash Map

get bucket(j,k) method:

bucket_j = buckets[j]
return bucket_j[k]

set bucket(j,k,v)

bucket_j = buckets[j]
bucket_j[k] = v

remove bucket(j,k) method:

bucket_j = buckets[j]
del bucket_j[k]

Linear Probing Hash Table

findBucketSlot(j,k) method:

Find key k in bucket j
if true, found match, returning location
if false, not match, returning 1st available slot

getBucket(j,k) method:

findBucketSlot(j,k)
if found:
- return table value

setBucket(j,k,v) method:

findBucketSlot(j,k)
if found:
- bucket_j[k] = v
else:
- new item(k,v)
- bucket_j insert new item
- update size

removeBucket(j,k) method:

findBucketSlot(j,k)
if found:
- mark as vacated using AVAIL constant

Complexity and Cost

Note: alpha is the loading factor, N/M, where N is the number of elements in the hash table, and M is the number of buckets.

Big-O Complexity of Hash Tables
	Hash Table (expected)	Hash Table (worst)	Chained Hash Map
get	O(1)	O(n)	O(alpha)
set	O(1)	O(n)	O(alpha)
remove	O(1)	O(n)	O(alpha)
size	O(1)	O(1)	O(1)
empty	O(1)	O(1)	O(1)
iteration over structure	O(n)	O(n)	O(n)

OOP Principles

Object hash codes

Java and Python both provide methods for modifying/directly implementing a custom object hash code method

Private utility methods:

Pack your functionality into general, protected utility methods, as with chained probing hash table implementations

Flags

@@ Line 5: / Line 5: @@
 '''hash table''' - data structure that uses key or value based lookup system
-'''dictionary problem''' - searching for an item in a structure; if item is not found, you learn nothing new
+'''dictionary problem''' - the classic problem of designing efficient data structures that implement associative arrays (supporting insert, delete, and search operations)
 '''hash function''' - a mapping from input objects to a hash table entry
@@ Line 47: / Line 47: @@
 '''load factor''' - alpha - ratio of number of items N to number of buckets M, cost is proportional to load factor
-'''open addressing''' - requires load factor is 1, stores elements directly instead of in buckets
+'''open addressing''' - requires load factor &lt; 1 (the table must not be completely full, or probing may fail to terminate); stores elements directly instead of in buckets
-'''linear probing''' - used with open addressing, collision handling technique in which a full element leads to a neighbor search (deletion becomes tricky, mark removed nodes)
+'''linear probing''' - used with open addressing, collision handling technique in which a full element leads to a neighbor search (deletion becomes tricky, mark removed slots)
 '''quadratic probing''' - alternative that provides more desirable properties than linear probing
@@ Line 56: / Line 56: @@
 <math>
-h(k) + i \cdot h^{\prime}(k)
+(h(k) + i \cdot h^{\prime}(k)) \mod N
 </math>
@@ Line 98: / Line 98: @@
 Probing Hash Table methods:
 * overrides getBucket/setBucket/delBucket to use linear/quadratic probing to find open slots
-* implements AVAILABLE sentinel
+* implements non-empty AVAILABLE sentinel
 =Algorithms for Operations=
@@ Line 105: / Line 105: @@
 get(k) method:
-* j = find bucket(k)
+* j = hashFunction(k)
-* get bucket(j,k)
+* return getBucket(j,k)
-* return item
 set(k,v) method:
-* j = find bucket
+* j = hashFunction(k)
-* set bucket(j,k,v)
+* setBucket(j,k,v)
-* update item
 * update size
 delete(k) method:
-* j = find bucket(k)
+* j = hashFunction(k)
-* delete bucket(j,k)
+* delBucket(j,k)
 * update size
-find bucket(k):
+get, set, and delete methods are all public
-* return hash code(k)
-get, set, and delete methods are all public
+hashFunction method varies depending on hash function chosen
-find bucket method, get bucket method, set bucket method are private and depend on the concrete implementation
+getBucket, setBucket, delBucket are private and depend on the concrete implementation
 ==Chained Hash Map==
 get bucket(j,k) method:
+* bucket_j = buckets[j]
 * return bucket_j[k]
 set bucket(j,k,v)
+* bucket_j = buckets[j]
 * bucket_j[k] = v
 remove bucket(j,k) method:
+* bucket_j = buckets[j]
 * del bucket_j[k]
 ==Linear Probing Hash Table==
-find bucket slot(j,k) method:
+findBucketSlot(j,k) method:
 * Find key k in bucket j
 * if true, found match, returning location
 * if false, not match, returning 1st available slot
-get bucket(j,k) method:
+getBucket(j,k) method:
-* find slot(j,k)
+* findBucketSlot(j,k)
 * if found:
 ** return table value
-set bucket(j,k,v) method:
+setBucket(j,k,v) method:
-* find slot(j,k)
+* findBucketSlot(j,k)
 * if found:
-** bucketj[k] = v
+** bucket_j[k] = v
 * else:
 ** new item(k,v)
-** bucketj insert new item
+** bucket_j insert new item
 ** update size
-remove bucket(j,k) method:
+removeBucket(j,k) method:
-* find slot(j,k)
+* findBucketSlot(j,k)
 * if found:
 ** mark as vacated using AVAIL constant
@@ Line 166: / Line 166: @@
 =Complexity and Cost=
-Note: alpha is the loading factor, N/M, where N is the nubmer of elements
+Note: alpha is the loading factor, N/M, where N is the number of elements
 in the hash table, and M is the number of buckets.
 {| class="wikitable" cellpadding="100" width="66%"
-!colspan="4"|Big-O Complexity of Lists
+!colspan="4"|Big-O Complexity of Hash Tables
 |-
 |
 |<br /><br />
-Unsorted List Hash Map (expected)
+Hash Table (expected)
 |<br /><br />
-Unsorted List Hash Map (worst)
+Hash Table (worst)
 |<br /><br />
 Chained Hash Map
@@ Line 201: / Line 201: @@
 |size
 |O(1)
-|O(n)
+|O(1)
 |O(1)
@@ Line 207: / Line 207: @@
 |empty
 |O(1)
-|O(n)
+|O(1)
 |O(1)

Hash Tables Study Guide: Difference between revisions

From charlesreid1