Maps and Sets Study Guide
From charlesreid1
Contents
Definitions and Variations
Definitions
dictionary - data structure in which values are mapped to keys, or act as their own keys.
dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails
map - mathematical analog term for dictionary, like a function (one input gives one output)
sorted map - data structure storing keys in a sorted order to provide data in order
hash table - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function
hash code - maps keys to integers
compression function - maps integers to a set of slots 0..N-1
bucket array - array storing the data of the hash table
hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)
immutability - a property required of keys
collisions - when two unique/distinct objects have the same output of the has function Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle h(k_i) = h(k_j), i \neq j}
chaining - technique to deal with collisions
exact search - lookups tell you nothing about nearby data
inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.
maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.
skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees
- motivated by inefficiency of sorted lists
- efficient search/update compromise
- randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized
height of skip list - number of linked lists to maintain
pseudo-random number generators - generate random-like numbers
start position of skip list - left-most node of top-most sparsest list
frozen set - an immutable set (Python type)
set - collection of related entities that does not allow duplicates, allows membership tests
multiset - set-like container allowing duplicates (counting map)
multimap - associates keys to values, but allows one key mapping to multiple values
template method pattern - virtual or unimplemented methods in parent class that are extended by children or by algorithms/functionality
Variations
Maps:
- Sorted Map
- Tree-based
- Array-based
- Skip list
- Unsorted Map
- Hash table based
- Array based
Sets:
- Sorted sets
- Tree implementation
- Skip list implementation
- Array implementation
- Unsorted sets
- Hash table implementation
- Array implementation
(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)
Skip lists
Hash tables
ADTs and Interfaces
Map ADT
Map ADT:
- get
- set
- delete
- contains
- size
- empty
- iterator
Utilities:
- key iterator/key set
- value iterator/value set
- clear
- equality checks
Sorted Map ADT:
- find min
- find max
- find less than/lte
- find greater than/gte
- find range
- iterator
- reversed
Hash Table ADT
Hash table ADT:
- get
- set
- delete
- resize
- hash function: ak+b mod p mod N
Skip Lists
Skip list interface:
- Same as map
- get
- set
- delete
- contains
Set ADT
Set ADT has 5 essential methods:
- add
- remove
- contains
- size
- empty
From these 5, others can be extended:
- pop
- equals/not equals
- lt/lte
- gt/gte
- union of sets
- intersection of sets
- symmetric difference
- not operator
Implementations
Map:
- Item class:
- Key/value
- equal/not equal/lt/gt
Sorted Map:
- private find index (k, lo, hi)
- public get/set/delete
- iterator/reverse iterator
- find max/min
- find gt/gte/lt/lte
Table Map:
- array of data: table
Hash Map:
- Hash function
- private get/set/remove from buckets
- table data
- prime integer
- n entries integer
- hash parameters a, b integer
- resize
Set:
- follow map implementation, but values are their own keys
Skip List:
- Skip Node class:
- level of skip node integer
- data
- Array or list of next Skip Node, indexed by level
- Comparison operators
- random number generator
- max height integer
- size integer
- head node pointer
Algorithms for Operations
Table Maps:
- Obvious, but impractical for large maps. Not detailed here.
Hash Maps:
- All operations are covered by Hash Tables Study Guide operations section
Tree Maps:
- All operations here are covered in the search tree study guide
Skip Lists
add(e) method:
- deal with empty case (new head node)
- get random insertion level (flipping coin)
- create new node
- walker = head
- handle case where node goes before head
- for each level,
- walker = walker.next
- while next node smaller than us:
- walk fwd 1 step
- if current level <= insertion level:
- insert node in this level
- walker = walker.next
- increment size
remove method:
- deal with empty case
- check for remove head (head value = remove value)
- step 1
- step 2
- step 3
- update size
remove step 1: get predecessor
- use protected getPredecessor method
- find predecessor of node to be removed
- predecessor.getNext(using level from prior step), gives result node
remove step 2:
- to find result next node,
- deal with removing head case:
- prior is null
- swap out result with result prior
- result next = result.next
remove step 3:
- starting from top occurring level, remove each node in the tower
- result = result.getNext
- result prev next = result next
- if we aren't at last level of tower,
- move down
- set results previous at new level
getPredecessor method:
- handle head case
- look for the node (and level) where next nod has target value
- if we reach end of row, move down
- if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
- if we are bigger than the next node, keep moving right
- if we are equal, return this node.
Complexity and Cost
Big O Complexity Table
Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.
Analysis of Skip Lists
Big O Complexity
| Big-O Complexity of Skip Lists | |||
|---|---|---|---|
Skip List | |||
| get | O(log N) | ||
| set | O(log N) | ||
| contains | O(log N) | ||
| remove | O(log N) | ||
| find min | O(1) | ||
| find max | O(1) | ||
| find lt/find lte/find gt/find gte | O(log N) | ||
| find range | O(log N + S), S items returned | ||
| iterate over all items | O(N) | ||
Probabilistic Analysis of Performance
Performance analysis:
- Base it on expectation
- High likelihood of good performance, given by Chernoff bounds
- Bounded height with high probability
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle P_i \leq \left( \dfrac{1}{2} \right)^i n }
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle P_i \leq \dfrac{n}{2^i} }
at a given level i, what is the probability of having an entry? 1/2, from coin flip. These combine to give the expression above.
Given a constant c > 1, the height of a tree is larger than c log n with probability of at most
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \dfrac{1}{n^{c-1}} }
The probability that the height is smaller than c log n is
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \left( 1 - \dfrac{1}{n^{c-1}} \right) }
Space Usage
We can analyze the space usage of a probabilistic data set by analyzing the expected sum of number of nodes. This is given by the sum of number of positions at level i:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \sum_{i=0}^{h} n \mathbb{P}_i = n \left( \sum_{i=0}^{h} \dfrac{1}{2^i} \right) }
The last term can be simplified using the fact that
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \sum_{i=1}^{n} a^i = \dfrac{a^{i+1} - 1}{a-1} }
to say that
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \sum_{i=0}^{h} \dfrac{1}{2^i} = 2 \left( 1 - \dfrac{1}{2^{h+1}} \right) }
This number is bounded by 2, and therefore
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \mathbb{E}(N) = n \mathbb{P} = \sum_{i=0}^{h} n \mathbb{P}_i \leq 2n }
and therefore the expected number of nodes in the skip list is O(n):
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \mathbb{E}(N) \sim O(n) }
OOP Principles
Flags
| Maps and Dictionaries Part of Computer Science Notes
Series on Data Structures
Maps/Dictionaries Maps · Maps/ADT · Maps in Java · Maps/OOP · Maps/Operations and Performance Map implementations: Maps/AbstractMap · Maps/UnsortedArrayMap · Maps/SortedArrayMap Dictionary implementations: Dictionaries/LinkedDict · Dictionaries/ArrayDict
Hashes Hash Maps/OOP · Hash Maps/Operations and Performance Hash Maps/Dynamic Resizing · Hash Maps/Collision Handling with Chaining Hash functions: Hash Functions · Hash Functions/Cyclic Permutation Hash map implementations: Hash Maps/AbstractHashMap · Hash Maps/ChainedHashMap
Skip Lists · Java/ConcurrentSkipList · Java implementations: SkipList
Sets Sets · Sets/ADT · Sets in Java · Sets/OOP · Multisets
|