Definitions and Variations

Definitions

dictionary - data structure in which values are mapped to keys, or act as their own keys.

dictionary problem - a find/insert/delete lookup problem in which you learn nothing if your search fails

map - mathematical analog term for dictionary, like a function (one input gives one output)

sorted map - data structure storing keys in a sorted order to provide data in order

hash table - a fast, unordered lookup table that works by turning an object (key or value) into a unique slot in an array via a hashing function

hash code - maps keys to integers

compression function - maps integers to a set of slots 0..N-1

bucket array - array storing the data of the hash table

hash function - a function h that maps keys k to an integer between 0 and N-1, denoted h(k)

immutability - a property required of keys

collisions - when two unique/distinct objects have the same output of the has function $h(k_{i})=h(k_{j}),i\neq j$

chaining - technique to deal with collisions

exact search - lookups tell you nothing about nearby data

inexact search - (timestamps example) can use sorting to get context info, less than, greater than, subset, etc.

maxima sets - storing (x,y) points used to model a cost/performance tradeoff. (a,b) dominates (c,d) if a <= c and b >= d. (a,b) is maximum pair if not dominated by any other pairs.

skip lists - data structure for storing data in sorted order, high probability of O(log N) access, simpler to implement relative to balanced search trees

motivated by inefficiency of sorted lists
efficient search/update compromise
randomized data structure gives O(log N) with high expectation, randomizing DS so keys don't have to be randomized

height of skip list - number of linked lists to maintain

pseudo-random number generators - generate random-like numbers

start position of skip list - left-most node of top-most sparsest list

frozen set - an immutable set (Python type)

set - collection of related entities that does not allow duplicates, allows membership tests

multiset - set-like container allowing duplicates (counting map)

multimap - associates keys to values, but allows one key mapping to multiple values

template method pattern - virtual or unimplemented methods in parent class that are extended by children or by algorithms/functionality

Variations

Maps:

Sorted Map
- Tree-based
- Array-based
- Skip list
Unsorted Map
- Hash table based
- Array based

Sets:

Sorted sets
- Tree implementation
- Skip list implementation
- Array implementation
Unsorted sets
- Hash table implementation
- Array implementation

(NOTE: While the array implementations are inefficient for larger maps, for a small map like in a B-Tree, O(N) insertion is not a big deal if in return you get O(log N) or O(1) lookup and/or if you have a map with a max of 4 items.)

Skip lists

Hash tables

See Hash Tables Study Guide

ADTs and Interfaces

Map ADT

Map ADT:

get
set
delete
contains
size
empty
iterator

Utilities:

key iterator/key set
value iterator/value set
clear
equality checks

Sorted Map ADT:

find min
find max
find less than/lte
find greater than/gte
find range
iterator
reversed

Hash Table ADT

Hash table ADT:

get
set
delete
resize
hash function: ak+b mod p mod N

Skip Lists

Skip list interface:

Same as map
get
set
delete
contains

Set ADT

Set ADT has 5 essential methods:

add
remove
contains
size
empty

From these 5, others can be extended:

pop
equals/not equals
lt/lte
gt/gte
union of sets
intersection of sets
symmetric difference
not operator

Implementations

Map:

Item class:
- Key/value
- equal/not equal/lt/gt

Sorted Map:

private find index (k, lo, hi)
public get/set/delete
iterator/reverse iterator
find max/min
find gt/gte/lt/lte

Table Map:

array of data: table

Hash Map:

Hash function
private get/set/remove from buckets
table data
prime integer
n entries integer
hash parameters a, b integer
resize

Set:

follow map implementation, but values are their own keys

Skip List:

Skip Node class:
- level of skip node integer
- data
- Array or list of next Skip Node, indexed by level
- Comparison operators
random number generator
max height integer
size integer
head node pointer

Algorithms for Operations

Table Maps:

Obvious, but impractical for large maps. Not detailed here.

Hash Maps:

All operations are covered by Hash Tables Study Guide operations section

Tree Maps:

All operations here are covered in the search tree study guide

Skip Lists

add(e) method:

deal with empty case (new head node)
get random insertion level (flipping coin)
create new node
walker = head
handle case where node goes before head
for each level,
- walker = walker.next
- while next node smaller than us:
  - walk fwd 1 step
- if current level <= insertion level:
  - insert node in this level
  - walker = walker.next
increment size

remove method:

deal with empty case
check for remove head (head value = remove value)
step 1
step 2
step 3
update size

remove step 1: get predecessor

use protected getPredecessor method
find predecessor of node to be removed
predecessor.getNext(using level from prior step), gives result node

remove step 2:

to find result next node,
deal with removing head case:
- prior is null
- swap out result with result prior
- result next = result.next

remove step 3:

starting from top occurring level, remove each node in the tower
- result = result.getNext
- result prev next = result next
- if we aren't at last level of tower,
  - move down
  - set results previous at new level

getPredecessor method:

handle head case
look for the node (and level) where next nod has target value
if we reach end of row, move down
if we find a larger node, stop moving right and move down (move down, then walker = walker.next)
if we are bigger than the next node, keep moving right
if we are equal, return this node.

Complexity and Cost

Big O Complexity Table

Complexity depends on implementation. See Hash Tables Study Guide and Search Trees Study Guide for big O tables.

Analysis of Skip Lists

Big O Complexity

Big-O Complexity of Skip Lists
	Skip List
get	O(log N)
set	O(log N)
contains	O(log N)
remove	O(log N)
find min	O(1)
find max	O(1)
find lt/find lte/find gt/find gte	O(log N)
find range	O(log N + S), S items returned
iterate over all items	O(N)

Probabilistic Analysis of Performance

Performance analysis:

Base it on expectation
High likelihood of good performance, given by Chernoff bounds
Bounded height with high probability

$P_{i}\leq \left({\dfrac {1}{2}}\right)^{i}n$

$P_{i}\leq {\dfrac {n}{2^{i}}}$

at a given level i, what is the probability of having an entry? 1/2, from coin flip. These combine to give the expression above.

Given a constant c > 1, the height of a tree is larger than c log n with probability of at most

${\dfrac {1}{n^{c-1}}}$

The probability that the height is smaller than c log n is

$\left(1-{\dfrac {1}{n^{c-1}}}\right)$

Space Usage

We can analyze the space usage of a probabilistic data set by analyzing the expected sum of number of nodes. This is given by the sum of number of positions at level i:

$\sum _{i=0}^{h}n\mathbb {P} _{i}=n\left(\sum _{i=0}^{h}{\dfrac {1}{2^{i}}}\right)$

The last term can be simplified using the fact that

$\sum _{i=1}^{n}a^{i}={\dfrac {a^{i+1}-1}{a-1}}$

to say that

$\sum _{i=0}^{h}{\dfrac {1}{2^{i}}}=2\left(1-{\dfrac {1}{2^{h+1}}}\right)$

This number is bounded by 2, and therefore

$\mathbb {E} (N)=n\mathbb {P} =\sum _{i=0}^{h}n\mathbb {P} _{i}\leq 2n$

and therefore the expected number of nodes in the skip list is O(n):

$\mathbb {E} (N)\sim O(n)$

OOP Principles

Flags

Maps and Sets Study Guide

From charlesreid1

Contents

Definitions and Variations

Definitions

Variations

ADTs and Interfaces

Map ADT

Hash Table ADT

Skip Lists

Set ADT

Implementations

Algorithms for Operations

Skip Lists

Complexity and Cost

Big O Complexity Table

Analysis of Skip Lists

Big O Complexity

Probabilistic Analysis of Performance

Space Usage

OOP Principles

Flags