Priority Queues/Timing and Performance/Old: Difference between revisions
From charlesreid1
m (Replacing charlesreid1.com:3000 with git.charlesreid1.com) |
|||
| (10 intermediate revisions by the same user not shown) | |||
| Line 16: | Line 16: | ||
* Loop over a "large" number of statistical trials. The average access time over all the trials is reported. | * Loop over a "large" number of statistical trials. The average access time over all the trials is reported. | ||
Link on git.charlesreid1.com: https://charlesreid1.com | Link on git.charlesreid1.com: https://git.charlesreid1.com/cs/java/src/master/priority-queues/Timing.java | ||
Class contents: | Class contents: | ||
| Line 32: | Line 32: | ||
// Tests | // Tests | ||
public static void main(String[] args) { | public static void main(String[] args) { | ||
sorted_timing(); | sorted_timing(); | ||
} | } | ||
| Line 45: | Line 45: | ||
StringBuffer sb = new StringBuffer(); | StringBuffer sb = new StringBuffer(); | ||
sb.append("N, Walltime Add ( | sb.append("N, Walltime Add (us), Walltime Rm Min (us)\n"); | ||
int ntrials = | int ntrials = 1000; | ||
Random r = new Random(); | Random r = new Random(); | ||
// Loop over values of N | // Loop over values of N | ||
for(int N = | int start = 50000; | ||
int skip = 50000; | |||
int MAX = 1000000; | |||
for(int N = start; N <= MAX; N+=skip) { | |||
Tim add_tim = new Tim(); | Tim add_tim = new Tim(); | ||
| Line 57: | Line 60: | ||
// Trials counter is always k for Kafka | // Trials counter is always k for Kafka | ||
for(int k = 0; k<ntrials; k++) { | for(int k = 0; k<ntrials; k++) { | ||
// Each trial is a different sequence of random numbers, | // Each trial is a different sequence of random numbers, | ||
// but the sequence matches between tests of different collection types | // but the sequence matches between tests of different collection types | ||
SortedPriorityQueue<Integer> q = new SortedPriorityQueue<Integer>(); | SortedPriorityQueue<Integer> q = new SortedPriorityQueue<Integer>(); | ||
add_tim.tic(); | add_tim.tic(); | ||
for(int i=0; i<N/2; i++) { | |||
Integer key = new Integer( r.nextInt() ); | |||
Integer val = new Integer( r.nextInt() ); | |||
q.add(key,val); | q.add(key,val); | ||
} | } | ||
| Line 71: | Line 74: | ||
rm_tim.tic(); | rm_tim.tic(); | ||
for(int i=0; i<N/4; i++) { | |||
q.removeMin(); | q.removeMin(); | ||
} | } | ||
rm_tim.toc(); | rm_tim.toc(); | ||
add_tim.tic(); | |||
for(int i=N/2; i<N; i++) { | |||
Integer key = new Integer( r.nextInt() ); | |||
Integer val = new Integer( r.nextInt() ); | |||
q.add(key,val); | |||
} | |||
add_tim.toc(); | |||
rm_tim.tic(); | |||
for(int i=N/4; i<N; i++) { | |||
q.removeMin(); | |||
} | |||
rm_tim.toc(); | |||
} | } | ||
// Normalized for trials, not for container size N | |||
sb.append( String.format("%d, ",N) ); | sb.append( String.format("%d, ",N) ); | ||
sb.append( String.format("%.3f, ", add_tim.elapsedms()/ntrials) ); | sb.append( String.format("%.3f, ", 1000*add_tim.elapsedms()/ntrials) ); | ||
sb.append( String.format("%.3f ", rm_tim.elapsedms()/ntrials) ); | sb.append( String.format("%.3f ", 1000*rm_tim.elapsedms()/ntrials) ); | ||
sb.append("\n"); | sb.append("\n"); | ||
} | } | ||
| Line 100: | Line 119: | ||
[[Image:PriorityQueueTiming_SortedRetry.png|500px]] | [[Image:PriorityQueueTiming_SortedRetry.png|500px]] | ||
= | I discovered that the code was using the exact same same key-value object for each add operation in the code timed above. The timing results were for a best case scenario. Once this was fixed, I re-ran and got the following results: | ||
[[Image:PriorityQueueTiming_Sorted3.png|500px]] | |||
The trend is still not very clear. This means, Hypothesis 1 is also likely - that we're looking at an operation that, relative to the overhead involved, is relatively cheaper, so that the linear increase in an O(N) operation is gradual enough to be nearly constant. | |||
===Results discussion=== | |||
The code results do not match the hypothesis. The code appears to be spending a constant (or very slowly increasing) amount of time on the add operation, even though it is performing an insertion sort each time. On the other hand, the remove operations are definitely confirmed to be O(1). | |||
Two possibilities: | |||
* Built-in doubly-linked list access operations are just so damn fast that most of the time spent to retrieve an item is on function overhead, which is O(1), so what we're actually seeing is not an O(N) increase in cost but a LARGE O(1) factor + a SMALL O(N) increase. | |||
* Bug in the timing code that's causing measured times to be sub-linear. | |||
Bug was fixed, as per above, and trend was still holding, so possibility 1 is likely. | |||
Latest revision as of 03:45, 9 October 2019
Priority Queues
Priority queues are queues that keep items in the queue in a sorted order, so that the minimum (highest priority) item comes out first.
Priority queue timing hypothesis
The hypothesis is that we will see the following behavior for sorted and unsorted implementations of priority queues:
- Unsorted list - add is O(1), min/remove min is O(N)
- Sorted list - add is O(N), min/remove min is O(1)
Priority queue timing class
Below is a basic class for measuring timing and performance of a sorted priority queue. The basic rundown of what the class does is as follows:
- Build up the CSV output using a StringBuffer
- Loop over different values of N, the size of the array. This is the number of add/remove operations being performed in total on one given array.
- Loop over a "large" number of statistical trials. The average access time over all the trials is reported.
Link on git.charlesreid1.com: https://git.charlesreid1.com/cs/java/src/master/priority-queues/Timing.java
Class contents:
import java.util.LinkedList;
import java.util.Random;
/** Timing class: measure big-O complexity and runtime of data structures.
*
* Compare algorithms, test structures, and verify expected big-O behavior.
*
*/
public class Timing {
// Tests
public static void main(String[] args) {
sorted_timing();
}
/** Time sorted priority queue. */
public static void sorted_timing() {
// This generates CSV files to verify the following information:
// - add method is O(N)
// - remove min method is O(1)
StringBuffer sb = new StringBuffer();
sb.append("N, Walltime Add (us), Walltime Rm Min (us)\n");
int ntrials = 1000;
Random r = new Random();
// Loop over values of N
int start = 50000;
int skip = 50000;
int MAX = 1000000;
for(int N = start; N <= MAX; N+=skip) {
Tim add_tim = new Tim();
Tim rm_tim = new Tim();
// Trials counter is always k for Kafka
for(int k = 0; k<ntrials; k++) {
// Each trial is a different sequence of random numbers,
// but the sequence matches between tests of different collection types
SortedPriorityQueue<Integer> q = new SortedPriorityQueue<Integer>();
add_tim.tic();
for(int i=0; i<N/2; i++) {
Integer key = new Integer( r.nextInt() );
Integer val = new Integer( r.nextInt() );
q.add(key,val);
}
add_tim.toc();
rm_tim.tic();
for(int i=0; i<N/4; i++) {
q.removeMin();
}
rm_tim.toc();
add_tim.tic();
for(int i=N/2; i<N; i++) {
Integer key = new Integer( r.nextInt() );
Integer val = new Integer( r.nextInt() );
q.add(key,val);
}
add_tim.toc();
rm_tim.tic();
for(int i=N/4; i<N; i++) {
q.removeMin();
}
rm_tim.toc();
}
// Normalized for trials, not for container size N
sb.append( String.format("%d, ",N) );
sb.append( String.format("%.3f, ", 1000*add_tim.elapsedms()/ntrials) );
sb.append( String.format("%.3f ", 1000*rm_tim.elapsedms()/ntrials) );
sb.append("\n");
}
System.out.println(sb.toString());
}
}
Priority queue timing results
Sorted priority queue
The results below are for a sorted priority queue. For a sorted priority queue, the minimum is always at the front, and so removal is an O(1) operation. When items are added to the priority queue they are added in order, so add is an O(N) operation. If you squint and look sideways, you can see a barely perceptible linear increase in the cost of add, versus the more flat curve for removal for a sorted list.
When I gave it another try, bumping up the maximum size to a million, I got still further ambiguous results...
I discovered that the code was using the exact same same key-value object for each add operation in the code timed above. The timing results were for a best case scenario. Once this was fixed, I re-ran and got the following results:
The trend is still not very clear. This means, Hypothesis 1 is also likely - that we're looking at an operation that, relative to the overhead involved, is relatively cheaper, so that the linear increase in an O(N) operation is gradual enough to be nearly constant.
Results discussion
The code results do not match the hypothesis. The code appears to be spending a constant (or very slowly increasing) amount of time on the add operation, even though it is performing an insertion sort each time. On the other hand, the remove operations are definitely confirmed to be O(1).
Two possibilities:
- Built-in doubly-linked list access operations are just so damn fast that most of the time spent to retrieve an item is on function overhead, which is O(1), so what we're actually seeing is not an O(N) increase in cost but a LARGE O(1) factor + a SMALL O(N) increase.
- Bug in the timing code that's causing measured times to be sub-linear.
Bug was fixed, as per above, and trend was still holding, so possibility 1 is likely.