Maps and Sets Study Guide: Difference between revisions
From charlesreid1
(Fix typos, improve definition accuracy, fill empty OOP Principles section, fix geometric series formula (via update-page on MediaWiki MCP Server)) |
|||
| (One intermediate revision by the same user not shown) | |||
| Line 3: | Line 3: | ||
==Definitions== | ==Definitions== | ||
'''dictionary''' - data structure | '''dictionary''' - data structure that stores key-value pairs, mapping each unique key to an associated value | ||
'''dictionary problem''' - a find/insert/delete lookup problem in which you learn nothing if your search fails | '''dictionary problem''' - a find/insert/delete lookup problem in which you learn nothing if your search fails | ||
| Line 11: | Line 11: | ||
'''sorted map''' - data structure storing keys in a sorted order to provide data in order | '''sorted map''' - data structure storing keys in a sorted order to provide data in order | ||
'''hash table''' - a fast, unordered lookup table that works by turning | '''hash table''' - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function | ||
'''hash code''' - maps keys to integers | '''hash code''' - maps keys to integers | ||
| Line 21: | Line 21: | ||
'''hash function''' - a function h that maps keys k to an integer between 0 and N-1, denoted h(k) | '''hash function''' - a function h that maps keys k to an integer between 0 and N-1, denoted h(k) | ||
'''immutability''' - a property required of keys | '''immutability''' - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant | ||
'''collisions''' - when two unique/distinct objects have the same output of the | '''collisions''' - when two unique/distinct objects have the same output of the hash function <math>h(k_i) = h(k_j), i \neq j</math> | ||
'''chaining''' - technique to deal with collisions | '''chaining''' - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket | ||
'''exact search''' - lookups tell you nothing about nearby data | '''exact search''' - lookups tell you nothing about nearby data | ||
| Line 31: | Line 31: | ||
'''inexact search''' - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc. | '''inexact search''' - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc. | ||
'''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a | '''maxima sets''' - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs. | ||
'''skip lists''' - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees | '''skip lists''' - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees | ||
| Line 52: | Line 52: | ||
'''multimap''' - associates keys to values, but allows one key mapping to multiple values | '''multimap''' - associates keys to values, but allows one key mapping to multiple values | ||
'''template method pattern''' - | '''template method pattern''' - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods | ||
==Variations== | ==Variations== | ||
| Line 210: | Line 210: | ||
** while next node smaller than us: | ** while next node smaller than us: | ||
*** walk fwd 1 step | *** walk fwd 1 step | ||
** if current level | ** if current level <= insertion level: | ||
*** insert node in this level | *** insert node in this level | ||
*** walker = walker.next | *** walker = walker.next | ||
| Line 218: | Line 218: | ||
* deal with empty case | * deal with empty case | ||
* check for remove head (head value = remove value) | * check for remove head (head value = remove value) | ||
* step 1 | * step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node) | ||
* step 2 | * step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next) | ||
* step 3 | * step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level) | ||
* update size | * update size | ||
getPredecessor method: | getPredecessor method: | ||
* handle head case | * handle head case | ||
* look for the node (and level) where next | * look for the node (and level) where next node has target value | ||
* if we reach end of row, move down | * if we reach end of row, move down | ||
* if we find a larger node, stop moving right and move down (move down, then walker = walker.next) | * if we find a larger node, stop moving right and move down (move down, then walker = walker.next) | ||
| Line 266: | Line 245: | ||
|- | |- | ||
| | | | ||
| | |<br /><br /> | ||
Skip List | Skip List | ||
| Line 314: | Line 293: | ||
* Bounded height with high probability | * Bounded height with high probability | ||
The expected number of nodes at level i is given by: | |||
<math> | <math> | ||
\mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i} | |||
</math> | </math> | ||
At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above. | |||
Given a constant c | Given a constant c > 1, the height of a skip list is larger than c log n with probability of '''at most''' | ||
<math> | <math> | ||
| Line 338: | Line 315: | ||
===Space Usage=== | ===Space Usage=== | ||
We can analyze the space usage of a probabilistic data | We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i: | ||
<math> | <math> | ||
| Line 344: | Line 321: | ||
</math> | </math> | ||
The last term can be simplified using the | The last term can be simplified using the geometric series formula: | ||
<math> | <math> | ||
\sum_{i= | \sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1} | ||
</math> | </math> | ||
| Line 359: | Line 336: | ||
<math> | <math> | ||
\mathbb{E}(N) | \mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n | ||
</math> | </math> | ||
| Line 369: | Line 346: | ||
=OOP Principles= | =OOP Principles= | ||
Key object-oriented design principles employed in maps and sets: | |||
* '''Template Method Pattern''' - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific. | |||
* '''Iterator Pattern''' - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table). | |||
* '''Composition''' - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels). | |||
* '''Encapsulation''' - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface. | |||
* '''Abstract Base Classes''' - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override. | |||
=Flags= | =Flags= | ||
Latest revision as of 03:59, 8 June 2026
Definitions and Variations
Definitions
dictionary - data structure that stores key-value pairs, mapping each unique key to an associated value
dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails
map - mathematical analog term for dictionary, like a function (one input gives one output)
sorted map - data structure storing keys in a sorted order to provide data in order
hash table - a fast, unordered lookup table that works by turning a key into a slot in a bucket array via a hash function
hash code - maps keys to integers
compression function - maps integers to a set of slots 0..N-1
bucket array - array storing the data of the hash table
hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)
immutability - a property required of keys in hash-based data structures; an immutable object's state cannot be modified after creation, ensuring its hash code remains constant
collisions - when two unique/distinct objects have the same output of the hash function $ h(k_i) = h(k_j), i \neq j $
chaining - technique to deal with collisions by storing colliding entries in a secondary data structure (e.g., linked list) at each bucket
exact search - lookups tell you nothing about nearby data
inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.
maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.
skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees
- motivated by inefficiency of sorted lists
- efficient search/update compromise
- randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized
height of skip list - number of linked lists to maintain
pseudo-random number generators - generate random-like numbers
start position of skip list - left-most node of top-most sparsest list
frozen set - an immutable set (Python type)
set - collection of related entities that does not allow duplicates, allows membership tests
multiset - set-like container allowing duplicates (counting map)
multimap - associates keys to values, but allows one key mapping to multiple values
template method pattern - a behavioral design pattern that defines the skeleton of an algorithm in a parent class method, deferring specific steps to subclasses through virtual/abstract methods
Variations
Maps:
- Sorted Map
- Tree-based
- Array-based
- Skip list
- Unsorted Map
- Hash table based
- Array based
Sets:
- Sorted sets
- Tree implementation
- Skip list implementation
- Array implementation
- Unsorted sets
- Hash table implementation
- Array implementation
(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)
Skip lists
Hash tables
ADTs and Interfaces
Map ADT
Map ADT:
- get
- set
- delete
- contains
- size
- empty
- iterator
Utilities:
- key iterator/key set
- value iterator/value set
- clear
- equality checks
Sorted Map ADT:
- find min
- find max
- find less than/lte
- find greater than/gte
- find range
- iterator
- reversed
Hash Table ADT
Hash table ADT:
- get
- set
- delete
- resize
- hash function: ak+b mod p mod N
Skip Lists
Skip list interface:
- Same as map
- get
- set
- delete
- contains
Set ADT
Set ADT has 5 essential methods:
- add
- remove
- contains
- size
- empty
From these 5, others can be extended:
- pop
- equals/not equals
- lt/lte
- gt/gte
- union of sets
- intersection of sets
- symmetric difference
- not operator
Implementations
Map:
- Item class:
- Key/value
- equal/not equal/lt/gt
Sorted Map:
- private find index (k, lo, hi)
- public get/set/delete
- iterator/reverse iterator
- find max/min
- find gt/gte/lt/lte
Table Map:
- array of data: table
Hash Map:
- Hash function
- private get/set/remove from buckets
- table data
- prime integer
- n entries integer
- hash parameters a, b integer
- resize
Set:
- follow map implementation, but values are their own keys
Skip List:
- Skip Node class:
- level of skip node integer
- data
- Array or list of next Skip Node, indexed by level
- Comparison operators
- random number generator
- max height integer
- size integer
- head node pointer
Algorithms for Operations
Table Maps:
- Obvious, but impractical for large maps. Not detailed here.
Hash Maps:
- All operations are covered by Hash Tables Study Guide operations section
Tree Maps:
- All operations here are covered in the search tree study guide
Skip Lists
add(e) method:
- deal with empty case (new head node)
- get random insertion level (flipping coin)
- create new node
- walker = head
- handle case where node goes before head
- for each level,
- walker = walker.next
- while next node smaller than us:
- walk fwd 1 step
- if current level <= insertion level:
- insert node in this level
- walker = walker.next
- increment size
remove method:
- deal with empty case
- check for remove head (head value = remove value)
- step 1: get predecessor (use protected getPredecessor method to find predecessor of node to be removed; predecessor.getNext using level from prior step gives result node)
- step 2: find result next node (deal with removing head case: prior is null, swap out result with result prior, result next = result.next)
- step 3: starting from top occurring level, remove each node in the tower (result = result.getNext; result prev next = result next; if not at last level of tower, move down and set result's previous at new level)
- update size
getPredecessor method:
- handle head case
- look for the node (and level) where next node has target value
- if we reach end of row, move down
- if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
- if we are bigger than the next node, keep moving right
- if we are equal, return this node.
Complexity and Cost
Big O Complexity Table
Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.
Analysis of Skip Lists
Big O Complexity
| Big-O Complexity of Skip Lists | |||
|---|---|---|---|
| <br /><br />
Skip List | |||
| get | O(log N) | ||
| set | O(log N) | ||
| contains | O(log N) | ||
| remove | O(log N) | ||
| find min | O(1) | ||
| find max | O(1) | ||
| find lt/find lte/find gt/find gte | O(log N) | ||
| find range | O(log N + S), S items returned | ||
| iterate over all items | O(N) | ||
Probabilistic Analysis of Performance
Performance analysis:
- Base it on expectation
- High likelihood of good performance, given by Chernoff bounds
- Bounded height with high probability
The expected number of nodes at level i is given by:
$ \mathbb{E}(N_i) \leq \left( \dfrac{1}{2} \right)^i n = \dfrac{n}{2^i} $
At a given level i, the probability of a node being promoted to that level is 1/2^i, from repeated coin flips. These combine to give the expression above.
Given a constant c > 1, the height of a skip list is larger than c log n with probability of at most
$ \dfrac{1}{n^{c-1}} $
The probability that the height is smaller than c log n is
$ \left( 1 - \dfrac{1}{n^{c-1}} \right) $
Space Usage
We can analyze the space usage of a probabilistic data structure by analyzing the expected sum of number of nodes. This is given by the sum of the expected number of nodes at level i:
$ \sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right) $
The last term can be simplified using the geometric series formula:
$ \sum_{i=0}^{n} a^i = \dfrac{a^{n+1} - 1}{a-1} $
to say that
$ \sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right) $
This number is bounded by 2, and therefore
$ \mathbb{E}(N) = \sum_{i=0}^{h} \mathbb{E}(N_i) \leq 2n $
and therefore the expected number of nodes in the skip list is O(n):
$ \mathbb{E}(N) \sim O(n) $
OOP Principles
Key object-oriented design principles employed in maps and sets:
- Template Method Pattern - Define algorithm skeleton in base class, defer implementation details to subclasses. Used extensively in map ADTs where traversal logic is shared but comparison operations are type-specific.
- Iterator Pattern - Provide a uniform way to traverse elements without exposing the underlying representation (array, tree, skip list, or hash table).
- Composition - Data structures compose lower-level building blocks (e.g., a hash map composes a bucket array; a skip list composes linked lists at multiple levels).
- Encapsulation - Internal representation (array, tree nodes, hash function parameters) is hidden behind the public ADT interface.
- Abstract Base Classes - Map, Set, and Sorted Map ADTs are defined as abstract base classes with virtual/abstract methods that concrete implementations must override.
Flags
| Maps and Dictionaries Part of Computer Science Notes
Series on Data Structures
Maps/Dictionaries Maps · Maps/ADT · Maps in Java · Maps/OOP · Maps/Operations and Performance Map implementations: Maps/AbstractMap · Maps/UnsortedArrayMap · Maps/SortedArrayMap Dictionary implementations: Dictionaries/LinkedDict · Dictionaries/ArrayDict
Hashes Hash Maps/OOP · Hash Maps/Operations and Performance Hash Maps/Dynamic Resizing · Hash Maps/Collision Handling with Chaining Hash functions: Hash Functions · Hash Functions/Cyclic Permutation Hash map implementations: Hash Maps/AbstractHashMap · Hash Maps/ChainedHashMap
Skip Lists · Java/ConcurrentSkipList · Java implementations: SkipList
Sets Sets · Sets/ADT · Sets in Java · Sets/OOP · Multisets
|