Revision as of 01:36, 16 July 2017

Notes

Goodrich Chapter 12

Chapter 12 - sorting and selection

Merge sort

Merge sort as an example of divide-and-conquer
Running time of merge sort
Merge sort and recurrence relations

Sorting through an algorithmic lens

lower bound
linear time sorting
comparing sort functions

Skiena Chapter 4

Lots of good information here.

Sorting

Applications of sorting:

Searching
Closest pairs
Element uniqueness
Frequency distribution
- To find how often an arbitrary element k occurs, look up k using a binary search, and walk to the left of that point until first the element is not k, then do the same to the right side
Selection
Convex hulls
- Once you have the points sorted by x-coordinate, the points can be inserted from left to right into the hull. Total time is linear after the sort is done.

Sorting lies at the heart of many algorithms. Sorting data is one of the first things any algorithm designer should try in the quest for efficiency.

Finding intersection of two sets of size m and n (m << n):

Sorting the big set first costs O(n log n) time, and doing a binary search for each of the m elements in n costs O(m log n), for a total cost of O((n+m) log n)
Sorting the small set first costs O(m log m) time, and doing a binary search for each of the n elements to find it in m costs O(n log m), for a total cost of O((m+n) log m)
Sorting both sets, and comparing each element of the two sets starting with the smallest. This costs O(n log n + m log m + n + m)

Overall the small-set sorting is the fastest.

Expected linear time can be achieved by hashing - building a hash table with elements from both sets - and this is probably a better solution.

Pragmatics

Several practical considerations when sorting:

Increasing (ascending) or decreasing (descending) order?
Sorting just the key, or the entire record?
What do you do when you get a tie?
- Certain efficient sort algorithms (like quicksort) run into quadratic performance trouble unless engineered to deal with ties
Non-numerical data

The general approach should be to use an external comparison function (a comparator) that takes pointers to records and returns the result of the comparison. In general, compare(a,b) will return negative if a comes before b, positive if b comes before a, and 0 if they are equal.

Why focus so much on search?

Knuth estimated 25% of compute time is spent sorting
Learning sorting is not just about learning sorting - the design techniques you learn in sort are important for other algorithmic problems

Heap Sort

NOTE: Heap sort is only for sorting - not for searching!

Heap sort: the practical implementation is with a priority queue

O(1) time to remove a particular item from an unsorted array once it is located, but O(n) time to locate it. Priority queue improves this to O(log n) time to find items, instead of O(n). Selection sort is also sped up from O(n^2) to O(n log n).

Heap sort is nothing but an implementation of selection sort using the right data structure.

Heaps maintain weaker-than-sorted but stronger-than-random order

Hierarchical organization - minimum items are at the top, heaps guarantee that any node (level i) is larger than its parent (level i-1)

Implementation:

Heaps can be implemented using a linked structure, but because they are filled from left to right, level by level, they are easy to implement as an array as well
We store the $$ 2^k $$ keys of a complete binary tree from left-to-right; level k keys are stored in positions $2^{k-1}$ to $$ 2^k-1 $$ .
Assume array index starts at 1, to simplify matters
can store any binary tree in an array without pointers, usually easiest option for heaps
Why do we use this method for storing the heap? Space efficiency. Filling in from the bottom ensures no holes in the tree.

Since all but the last level is always filled, the height h of an n-element heap is logarithmic:

Each level is guaranteed to have $$ 2^k $$ nodes, so:

$\sum_{k=0}^{h} 2^k = 2^{h+1} - 1 \leq n$

and therefore

$n = floor( \log{(n)} )$

Flags

@@ Line 1: / Line 1: @@
 =Notes=
-==Goodrich==
+==Goodrich Chapter 12==
 Chapter 12 - sorting and selection
@@ Line 14: / Line 14: @@
 * linear time sorting
 * comparing sort functions
+==Skiena Chapter 4==
+Lots of good information here.
+===Sorting===
+Applications of sorting:
+* Searching
+* Closest pairs
+* Element uniqueness
+* Frequency distribution
+** To find how often an arbitrary element k occurs, look up k using a binary search, and walk to the left of that point until first the element is not k, then do the same to the right side
+* Selection
+* Convex hulls
+** Once you have the points sorted by x-coordinate, the points can be inserted from left to right into the hull. Total time is linear after the sort is done.
+'''Sorting lies at the heart of many algorithms. Sorting data is one of the first things any algorithm designer should try in the quest for efficiency.'''
+Finding intersection of two sets of size m and n (m << n):
+* Sorting the big set first costs O(n log n) time, and doing a binary search for each of the m elements in n costs O(m log n), for a total cost of O((n+m) log n)
+* Sorting the small set first costs O(m log m) time, and doing a binary search for each of the n elements to find it in m costs O(n log m), for a total cost of O((m+n) log m)
+* Sorting both sets, and comparing each element of the two sets starting with the smallest. This costs O(n log n + m log m + n + m)
+Overall the small-set sorting is the fastest.
+Expected linear time can be achieved by hashing - building a hash table with elements from both sets - and this is probably a better solution.
+===Pragmatics===
+Several practical considerations when sorting:
+* Increasing (ascending) or decreasing (descending) order?
+* Sorting just the key, or the entire record?
+* What do you do when you get a tie?
+** Certain efficient sort algorithms (like quicksort) run into quadratic performance trouble unless engineered to deal with ties
+* Non-numerical data
+The general approach should be to use an external comparison function (a comparator) that takes pointers to records and returns the result of the comparison. In general, compare(a,b) will return negative if a comes before b, positive if b comes before a, and 0 if they are equal.
+Why focus so much on search?
+* Knuth estimated 25% of compute time is spent sorting
+* Learning sorting is not just about learning sorting - the design techniques you learn in sort are important for other algorithmic problems
+===Heap Sort===
+NOTE: Heap sort is only for '''sorting''' - not for searching!
+Heap sort: the practical implementation is with a priority queue
+O(1) time to remove a particular item from an unsorted array once it is located, but O(n) time to locate it. Priority queue improves this to O(log n) time to find items, instead of O(n). Selection sort is also sped up from O(n^2) to O(n log n).
+'''Heap sort is nothing but an implementation of selection sort using the right data structure.'''
+Heaps maintain weaker-than-sorted but stronger-than-random order
+Hierarchical organization - minimum items are at the top, heaps guarantee that any node (level i) is larger than its parent (level i-1)
+Implementation:
+* Heaps can be implemented using a linked structure, but because they are filled from left to right, level by level, they are easy to implement as an array as well
+* We store the <math>2^k</math> keys of a complete binary tree from left-to-right; level k keys are stored in positions <math>2^{k-1}</math> to <math>2^k-1</math>.
+* Assume array index starts at 1, to simplify matters
+* can store any binary tree in an array without pointers, usually easiest option for heaps
+* Why do we use this method for storing the heap? Space efficiency. Filling in from the bottom ensures no holes in the tree.
+Since all but the last level is always filled, the height h of an n-element heap is logarithmic:
+Each level is guaranteed to have <math>2^k</math> nodes, so:
+<math>
+\sum_{k=0}^{h} 2^k = 2^{h+1} - 1 \leq n
+</math>
+and therefore
+<math>
+n = floor( \log{(n)} )
+</math>
+=Flags=
 {{AlgorithmsFlag}}
+[[Category:Sort]]

Algorithms/Sort: Difference between revisions

From charlesreid1