Priority Queues/Timing and Performance/Old: Difference between revisions

Latest revision as of 03:45, 9 October 2019

Priority Queues

Priority queues are queues that keep items in the queue in a sorted order, so that the minimum (highest priority) item comes out first.

Priority queue timing hypothesis

The hypothesis is that we will see the following behavior for sorted and unsorted implementations of priority queues:

Unsorted list - add is O(1), min/remove min is O(N)
Sorted list - add is O(N), min/remove min is O(1)

Priority queue timing class

Below is a basic class for measuring timing and performance of a sorted priority queue. The basic rundown of what the class does is as follows:

Build up the CSV output using a StringBuffer
Loop over different values of N, the size of the array. This is the number of add/remove operations being performed in total on one given array.
Loop over a "large" number of statistical trials. The average access time over all the trials is reported.

Link on git.charlesreid1.com: https://git.charlesreid1.com/cs/java/src/master/priority-queues/Timing.java

Class contents:

import java.util.LinkedList;
import java.util.Random;

/** Timing class: measure big-O complexity and runtime of data structures.
 *
 * Compare algorithms, test structures, and verify expected big-O behavior.
 *
 */
public class Timing {

	// Tests
	public static void main(String[] args) { 
		sorted_timing();
	}

	/** Time sorted priority queue. */
	public static void sorted_timing() {

		// This generates CSV files to verify the following information:
		// - add method is O(N)
		// - remove min method is O(1)

		StringBuffer sb = new StringBuffer();

		sb.append("N, Walltime Add (us), Walltime Rm Min (us)\n");

		int ntrials = 1000;
		Random r = new Random();

		// Loop over values of N
		int start =   50000;
		int skip  =   50000;
		int MAX   = 1000000;
		for(int N = start; N <= MAX; N+=skip) { 

			Tim add_tim = new Tim();
			Tim rm_tim = new Tim();

			// Trials counter is always k for Kafka
			for(int k = 0; k<ntrials; k++) { 
				// Each trial is a different sequence of random numbers,
				// but the sequence matches between tests of different collection types 
				SortedPriorityQueue<Integer> q = new SortedPriorityQueue<Integer>();

				add_tim.tic();
        		for(int i=0; i<N/2; i++) { 
					Integer key = new Integer( r.nextInt() );
					Integer val = new Integer( r.nextInt() );
					q.add(key,val);
				}
				add_tim.toc();

				rm_tim.tic();
        		for(int i=0; i<N/4; i++) { 
					q.removeMin();
				}
				rm_tim.toc();

				add_tim.tic();
        		for(int i=N/2; i<N; i++) { 
					Integer key = new Integer( r.nextInt() );
					Integer val = new Integer( r.nextInt() );
					q.add(key,val);
				}
				add_tim.toc();

				rm_tim.tic();
        		for(int i=N/4; i<N; i++) { 
					q.removeMin();
				}
				rm_tim.toc();

			}

			// Normalized for trials, not for container size N
			sb.append( String.format("%d, ",N) );
			sb.append( String.format("%.3f, ", 1000*add_tim.elapsedms()/ntrials) );
			sb.append( String.format("%.3f  ",  1000*rm_tim.elapsedms()/ntrials) );
			sb.append("\n");
		}

		System.out.println(sb.toString());
	}
}

Priority queue timing results

Sorted priority queue

The results below are for a sorted priority queue. For a sorted priority queue, the minimum is always at the front, and so removal is an O(1) operation. When items are added to the priority queue they are added in order, so add is an O(N) operation. If you squint and look sideways, you can see a barely perceptible linear increase in the cost of add, versus the more flat curve for removal for a sorted list.

When I gave it another try, bumping up the maximum size to a million, I got still further ambiguous results...

I discovered that the code was using the exact same same key-value object for each add operation in the code timed above. The timing results were for a best case scenario. Once this was fixed, I re-ran and got the following results:

The trend is still not very clear. This means, Hypothesis 1 is also likely - that we're looking at an operation that, relative to the overhead involved, is relatively cheaper, so that the linear increase in an O(N) operation is gradual enough to be nearly constant.

Results discussion

The code results do not match the hypothesis. The code appears to be spending a constant (or very slowly increasing) amount of time on the add operation, even though it is performing an insertion sort each time. On the other hand, the remove operations are definitely confirmed to be O(1).

Two possibilities:

Built-in doubly-linked list access operations are just so damn fast that most of the time spent to retrieve an item is on function overhead, which is O(1), so what we're actually seeing is not an O(N) increase in cost but a LARGE O(1) factor + a SMALL O(N) increase.
Bug in the timing code that's causing measured times to be sub-linear.

Bug was fixed, as per above, and trend was still holding, so possibility 1 is likely.