Algorithms/Sort: Difference between revisions
From charlesreid1
No edit summary |
No edit summary |
||
| Line 1: | Line 1: | ||
=Notes= | =Notes= | ||
==Goodrich== | ==Goodrich Chapter 12== | ||
Chapter 12 - sorting and selection | Chapter 12 - sorting and selection | ||
| Line 14: | Line 14: | ||
* linear time sorting | * linear time sorting | ||
* comparing sort functions | * comparing sort functions | ||
==Skiena Chapter 4== | |||
Lots of good information here. | |||
===Sorting=== | |||
Applications of sorting: | |||
* Searching | |||
* Closest pairs | |||
* Element uniqueness | |||
* Frequency distribution | |||
** To find how often an arbitrary element k occurs, look up k using a binary search, and walk to the left of that point until first the element is not k, then do the same to the right side | |||
* Selection | |||
* Convex hulls | |||
** Once you have the points sorted by x-coordinate, the points can be inserted from left to right into the hull. Total time is linear after the sort is done. | |||
'''Sorting lies at the heart of many algorithms. Sorting data is one of the first things any algorithm designer should try in the quest for efficiency.''' | |||
Finding intersection of two sets of size m and n (m << n): | |||
* Sorting the big set first costs O(n log n) time, and doing a binary search for each of the m elements in n costs O(m log n), for a total cost of O((n+m) log n) | |||
* Sorting the small set first costs O(m log m) time, and doing a binary search for each of the n elements to find it in m costs O(n log m), for a total cost of O((m+n) log m) | |||
* Sorting both sets, and comparing each element of the two sets starting with the smallest. This costs O(n log n + m log m + n + m) | |||
Overall the small-set sorting is the fastest. | |||
Expected linear time can be achieved by hashing - building a hash table with elements from both sets - and this is probably a better solution. | |||
===Pragmatics=== | |||
Several practical considerations when sorting: | |||
* Increasing (ascending) or decreasing (descending) order? | |||
* Sorting just the key, or the entire record? | |||
* What do you do when you get a tie? | |||
** Certain efficient sort algorithms (like quicksort) run into quadratic performance trouble unless engineered to deal with ties | |||
* Non-numerical data | |||
The general approach should be to use an external comparison function (a comparator) that takes pointers to records and returns the result of the comparison. In general, compare(a,b) will return negative if a comes before b, positive if b comes before a, and 0 if they are equal. | |||
Why focus so much on search? | |||
* Knuth estimated 25% of compute time is spent sorting | |||
* Learning sorting is not just about learning sorting - the design techniques you learn in sort are important for other algorithmic problems | |||
===Heap Sort=== | |||
NOTE: Heap sort is only for '''sorting''' - not for searching! | |||
Heap sort: the practical implementation is with a priority queue | |||
O(1) time to remove a particular item from an unsorted array once it is located, but O(n) time to locate it. Priority queue improves this to O(log n) time to find items, instead of O(n). Selection sort is also sped up from O(n^2) to O(n log n). | |||
'''Heap sort is nothing but an implementation of selection sort using the right data structure.''' | |||
Heaps maintain weaker-than-sorted but stronger-than-random order | |||
Hierarchical organization - minimum items are at the top, heaps guarantee that any node (level i) is larger than its parent (level i-1) | |||
Implementation: | |||
* Heaps can be implemented using a linked structure, but because they are filled from left to right, level by level, they are easy to implement as an array as well | |||
* We store the <math>2^k</math> keys of a complete binary tree from left-to-right; level k keys are stored in positions <math>2^{k-1}</math> to <math>2^k-1</math>. | |||
* Assume array index starts at 1, to simplify matters | |||
* can store any binary tree in an array without pointers, usually easiest option for heaps | |||
* Why do we use this method for storing the heap? Space efficiency. Filling in from the bottom ensures no holes in the tree. | |||
Since all but the last level is always filled, the height h of an n-element heap is logarithmic: | |||
Each level is guaranteed to have <math>2^k</math> nodes, so: | |||
<math> | |||
\sum_{k=0}^{h} 2^k = 2^{h+1} - 1 \leq n | |||
</math> | |||
and therefore | |||
<math> | |||
n = floor( \log{(n)} ) | |||
</math> | |||
=Flags= | |||
{{AlgorithmsFlag}} | {{AlgorithmsFlag}} | ||
[[Category:Sort]] | |||
Revision as of 01:36, 16 July 2017
Notes
Goodrich Chapter 12
Chapter 12 - sorting and selection
Merge sort
- Merge sort as an example of divide-and-conquer
- Running time of merge sort
- Merge sort and recurrence relations
Sorting through an algorithmic lens
- lower bound
- linear time sorting
- comparing sort functions
Skiena Chapter 4
Lots of good information here.
Sorting
Applications of sorting:
- Searching
- Closest pairs
- Element uniqueness
- Frequency distribution
- To find how often an arbitrary element k occurs, look up k using a binary search, and walk to the left of that point until first the element is not k, then do the same to the right side
- Selection
- Convex hulls
- Once you have the points sorted by x-coordinate, the points can be inserted from left to right into the hull. Total time is linear after the sort is done.
Sorting lies at the heart of many algorithms. Sorting data is one of the first things any algorithm designer should try in the quest for efficiency.
Finding intersection of two sets of size m and n (m << n):
- Sorting the big set first costs O(n log n) time, and doing a binary search for each of the m elements in n costs O(m log n), for a total cost of O((n+m) log n)
- Sorting the small set first costs O(m log m) time, and doing a binary search for each of the n elements to find it in m costs O(n log m), for a total cost of O((m+n) log m)
- Sorting both sets, and comparing each element of the two sets starting with the smallest. This costs O(n log n + m log m + n + m)
Overall the small-set sorting is the fastest.
Expected linear time can be achieved by hashing - building a hash table with elements from both sets - and this is probably a better solution.
Pragmatics
Several practical considerations when sorting:
- Increasing (ascending) or decreasing (descending) order?
- Sorting just the key, or the entire record?
- What do you do when you get a tie?
- Certain efficient sort algorithms (like quicksort) run into quadratic performance trouble unless engineered to deal with ties
- Non-numerical data
The general approach should be to use an external comparison function (a comparator) that takes pointers to records and returns the result of the comparison. In general, compare(a,b) will return negative if a comes before b, positive if b comes before a, and 0 if they are equal.
Why focus so much on search?
- Knuth estimated 25% of compute time is spent sorting
- Learning sorting is not just about learning sorting - the design techniques you learn in sort are important for other algorithmic problems
Heap Sort
NOTE: Heap sort is only for sorting - not for searching!
Heap sort: the practical implementation is with a priority queue
O(1) time to remove a particular item from an unsorted array once it is located, but O(n) time to locate it. Priority queue improves this to O(log n) time to find items, instead of O(n). Selection sort is also sped up from O(n^2) to O(n log n).
Heap sort is nothing but an implementation of selection sort using the right data structure.
Heaps maintain weaker-than-sorted but stronger-than-random order
Hierarchical organization - minimum items are at the top, heaps guarantee that any node (level i) is larger than its parent (level i-1)
Implementation:
- Heaps can be implemented using a linked structure, but because they are filled from left to right, level by level, they are easy to implement as an array as well
- We store the $ 2^k $ keys of a complete binary tree from left-to-right; level k keys are stored in positions $ 2^{k-1} $ to $ 2^k-1 $.
- Assume array index starts at 1, to simplify matters
- can store any binary tree in an array without pointers, usually easiest option for heaps
- Why do we use this method for storing the heap? Space efficiency. Filling in from the bottom ensures no holes in the tree.
Since all but the last level is always filled, the height h of an n-element heap is logarithmic:
Each level is guaranteed to have $ 2^k $ nodes, so:
$ \sum_{k=0}^{h} 2^k = 2^{h+1} - 1 \leq n $
and therefore
$ n = floor( \log{(n)} ) $
Flags
| Algorithms Part of Computer Science Notes
Series on Algorithms
Algorithms/Sort · Algorithmic Analysis of Sort Functions · Divide and Conquer · Divide and Conquer/Master Theorem Three solid O(n log n) search algorithms: Merge Sort · Heap Sort · Quick Sort Algorithm Analysis/Merge Sort · Algorithm Analysis/Randomized Quick Sort
Algorithms/Search · Binary Search · Binary Search Modifications
Algorithms/Combinatorics · Algorithms/Combinatorics and Heuristics · Algorithms/Optimization · Divide and Conquer
Algorithms/Strings · Algorithm Analysis/Substring Pattern Matching
Algorithm complexity · Theta vs Big O Amortization · Amortization/Aggregate Method · Amortization/Accounting Method Algorithm Analysis/Matrix Multiplication
Estimation Estimation · Estimation/BitsAndBytes
Algorithm Practice and Writeups Project Euler · Five Letter Words · Letter Coverage
|