Lecture 7

Priority Queues

A priority queue is a data structure for maintaining a set of elements, each with an associated value called a key. In a normal queue, elements come off in first-in-first-out order, so the first element in the queue is the top element. In a priority queue, the element with the largest key is always on the top, no matter what order it or the other elements were inserted. Some uses for priority queues:

Dÿkstra's Algorithm. This algorithm solves the single-source shortest-paths problem on a weighted digraph. The algorithm uses a priority queue to consider edges of the graph in order of increasing weight (the key used can be the negated weight).
Huffman's Algorithm. This algorithm builds a tree representing the optimal prefix code given a given the frequency for each symbol. This code can be used to compress data, sometimes with a significant reduction in the amount of storage needed. A priority queue is again used to select the next symbol to assign a code.
Operating System Scheduling Algorithms. Some processes in an operating system run with higher priority than others. For instance, an interactive 'vi' session should run with higher priority than a background numerical simulation, to provide the user with a reasonable interactive response time. This isn't only nice, it is neccessary to prevent starvation of essential system services. Putting processes that are ready to run on a priority queue ensures that the scheduler will pick the process with highest priority to run next.

Priority Queue Operations

The following operations are defined on priority queues (assumed to be initially empty sets):

Insert (S, x) inserts x into the set.
Maximum (S) returns the element of S with largest key.
Extract-Max (S) removes and returns the element with largest key.

Naive Implementation

A naive implementation of these operations is to represent the queue as a linked list L:

Maximum looks at each element of the list, keeping track of and finally returning the maximum element.
```
Naive-Max (L)
	M = L
	max = head (M)
	while M is not empty
		if (head(M) > max) max = head(M)
		M = tail(M)
	end while
	return max
```
If the list has n elements, this algorithm is (n) since it must iterate exactly n times.
Extract-Max does the same thing, but also deletes the element it finds:
```
Naive-Extract-Max (L)
	do the Naive-Max algorithm, deleting and returning the
	element from the list
```
Since only (1) work is needed to delete an element from a reasonably implemented linked list, this algorithm is also (n).
Insert just puts the element onto the linked list.
```
Naive-Insert (L, key)
	M = new node with key
	tail(M) = L
	L = M
```
This does (1) work since we're not worried about keeping the list in any order.

Heap Implementation

We can use a heap to improve the implementation, since a heap always keeps its maximum element in the first element:

Maximum just returns the top element of the heap.
```
Heap-Maximum (A)
	return A[1]
```
With exactly one array access, the time is simply (1), compared with (n) for the naive priority queue.
Insert increases the size of the heap, then chases up the tree looking for the right place for the new element in a way similar to Heapify:
```
Heap-Insert (A, key)
	heap-size(A)++				// increase heap size
	i = heap-size(A)			// i is last index in heap
	while (i > 1) and (A[Parent(i)] < key)	// while new element is
						// still too big...
	do 
		A[i] = A[Parent(i)]		// swap parent down, go
						// up one level
		i = Parent (i)
	end while
	A[i] = key				// key is in right place
```
This could possibly go from a leaf up to the root, taking O(height) = O(lg n) time, compared with O(1) of the naive version. This is a little worse, but as we will see, that doesn't matter too much. Note: we might actually see (lg n) performance if we keep extracting the maximum and then inserting it back into the heap, because we expect to have to swap all the way from a leaf to the root.

Extract-Max essentially does one round of the Heapsort loop, removing the top of the heap, swapping the last element into its place, then Heapifying:

Heap-Extract-Max (A)
	if (heap-size(A) < 1)
		error "heap underflow"		// woops, empty heap!
	max = A[1]
	A[1] = A[heap-size(A)]			// move last element to root
	heap-size(A)--
	Heapify (A, 1)				// make it a heap again
	return max

Everything is O(1) except for the call to Heapify, which we know is O(lg n).

Let's compare the two implementations in a table:

Naive vs. Heap-based Priority Queues
Type of Queue	Time for `Maximum`	Time for `Insert`	Time for `Extract-Max`
Linked List	(n)	(1)	(n)
Heap Based	(1)	O(lg n)	O(lg n)

Using a heap-based priority queue looks good, but should it bother us that Heap-Insert takes O(lg n) time while Naive-Insert takes only (1) time? No: Most algorithms that use a priority queue, like those mentioned above, will either leave the queue empty at the end or reach a steady state where the same number of items enter the queue as leave the queue. Thus, for every call to Insert, there must be a corresponding call to Extract-Max. Any advantage the naive implemenation gains with its (1) Insert is quickly lost when it has to do a (n) Extract-Max.

Also, a linked list may grow arbitrarily large, while the heap owes much of its speed advantage to its implementation in a fixed-sized array. We may, from time to time, have to reallocate the heap's array to accomodate more elements. Reallocating may take O(n) time. Will this eat up our (lg n) advantage with Insert? Yes, if we have to reallocate every time we do an Insert. But if we are smart, we will double the size of the array each time we reallocate, anticipating future Inserts. This will allow us to do only up to O(lg n) reallocations during a program run, preserving the heap's asymptotic advantage.

Note: if the number of different priorities is a small constant k, then using an implementation based on having an array of k linked lists might give better performance than the heap-based priority queue. Such is the case with operating system schedulers with a small number of priorities, e.g., "system," "user," and "background" (k=3).