A LOWER BOUND ON COMPARISON-BASED SORTING
=========================================
Suppose we have a scrambled array of n numbers, with each number from 1...n
occurring once.  How many possible orders can the numbers be in?

The answer is n!, where n! = 1 * 2 * 3 * ... * (n-2) * (n-1) * n.  Here's why:
the first number in the array can be anything from 1...n, yielding n
possibilities.  Once the first number is chosen, the second number can be any
one of the remaining n-1 numbers, so there are n * (n-1) possible choices of
the first two numbers.  The third number can be any one of the remaining n-2
numbers, yielding n * (n-1) * (n-2) possibilities for the first three numbers.
Continue this reasoning to its logical conclusion.

Each different order is called a _permutation_ of the numbers, and there are n!
possible permutations.  (For Homework 9, you are asked to create a random
permutation of maze walls.)

Observe that if n > 0,
                                                                 n
  n! = 1 * 2 * ... * (n-1) * n <= n * n * n * ... * n * n * n = n

and (supposing n is even)
                                  n    n                                n/2
  n! = 1 * 2 * ... * (n-1) * n >= - * (- + 1) * ... * (n-1) * n >= (n/2)
                                  2    2

so n! is between (n/2)^(n/2) and n^n.  Let's look at the logarithms of both
these numbers: log((n/2)^(n/2)) = (n/2) log (n/2), which is in Theta(n log n),
and log(n^n) = n log n.  Hence, log(n!) is also in Theta(n log n).

A _comparison-based_sort_ is one in which all decisions are based on comparing
keys (generally done by "if" statements).  All actions taken by the sorting
algorithm are based on the results of a sequence of true/false questions.  All
of the sorting algorithms we have studied are comparison-based.

Suppose that two computers run the _same_ sorting algorithm at the same time on
two _different_ inputs.  Suppose that every time one computer executes an "if"
statement and finds it true, the other computer executes the same "if"
statement and also finds it true; likewise, when one computer executes an "if"
and finds it false, so does the other.  Then both computers perform exactly the
same data movements (e.g. swapping the numbers at indices i and j) in exactly
the same order, so they both permute their inputs in _exactly_ the same way.

A correct sorting algorithm must generate a _different_ sequence of true/false
answers for each different permutation of 1...n, because it takes a different
sequence of data movements to sort each permutation.  There are n! different
permutations, thus n! different sequences of true/false answers.

If a sorting algorithm asks d true/false questions, it generates <= 2^d
different sequences of true/false answers.  If it correctly sorts every
permutation of 1...n, then n! <= 2^d, so log_2 (n!) <= d, and d is in
Omega(n log n).  The algorithm spends Omega(d) time asking these d questions.
Hence,

 ==============================================================================
 EVERY comparison-based sorting algorithm takes Omega(n log n) worst-case time.
 ==============================================================================

This is an amazing claim, because it doesn't just analyze one algorithm.  It
says that of the thousands of comparison-based sorting algorithms that haven't
even been invented yet, not one of them has any hope of beating O(n log n) time
for all inputs of length n.

LINEAR-TIME SORTING
===================
However, there are faster sorting algorithms that can make q-way decisions for
large values of q, instead of true/false (2-way) decisions.  Some of these
algorithms run in linear time.

Bucket Sort
-----------
_Bucket_sort_ works well when keys are distributed in a small range, e.g. from
0 to q - 1, and the number of items n is larger than, or nearly as large as, q.
In other words, when q is in O(n).

We allocate an array of q queues (or singly-linked lists with tail references,
which are basically the same thing, but we only need the queue operations),
numbered from 0 to q - 1.  The queues are called _buckets_.  We walk through
the list of input items, and enqueue each item in the appropriate queue:
an item with key i goes into queue i.

Each item illustrated here has a numerical key and an associated value.

             -------------------------------------------------------------
       Input | 6:a | 7:b | 3:c | 0:d | 3:e | 1:f | 5:g | 0:h | 3:i | 7:j |
             -------------------------------------------------------------

                 0       1       2       3       4       5       6       7
             -----------------------------------------------------------------
Queue fronts |   .   |   .   |   *   |   .   |   *   |   .   |   .   |   .   |
             ----|-------|---------------|---------------|-------|-------|----
                 v       v               v               v       v       v
              ------- -------         -------         ------- ------- -------
              | 0:d | | 1:f |         | 3:c |         | 5:g | | 6:a | | 7:b |
              |  .  | |     |         |  .  |         |  *  | |  *  | |  .  |
              ---|--- -------         ---|---         ------- ------- ---|---
                 v       ^               v               ^       ^       v
              -------    |            -------            |       |    -------
              | 0:h |    |            | 3:e |            |       |    | 7:j |
              |  *  |    |            |  .  |            |       |    |  *  |
              -------    |            ---|---            |       |    -------
                 ^       |               v               |       |       ^
                 |       |            -------            |       |       |
                 |       |            | 3:i |            |       |       |
                 |       |            |  *  |            |       |       |
                 |       |            -------            |       |       |
                 |       |               ^               |       |       |
             ----|-------|---------------|---------------|-------|-------|----
Queue tails  |   .   |   .   |   *   |   .   |   *   |   .   |   .   |   .   |
             -----------------------------------------------------------------

When we're done, we concatenate all the queues together in order.

Concatenated output:
------- ------- ------- ------- ------- ------- ------- ------- ------- -------
| 0:d |>| 0:h |>| 1:f |>| 3:c |>| 3:e |>| 3:i |>| 5:g |>| 6:a |>| 7:b |>| 7:j |
------- ------- ------- ------- ------- ------- ------- ------- ------- -------

This data structure is _exactly_ like a hash table (plus tail references), but
the hash code just maps the key i to bucket i, and there is no compression
function because there is no need for compression.

Bucket sort takes Theta(q + n) time--in the best case and in the worst case.
It takes Theta(q) time to initialize the buckets in the beginning and to
concatenate them together in the end.  It takes Theta(n) time to put all the
items in their buckets.

If q is in O(n)--that is, the number of possible keys isn't much larger than
the number of items we're sorting--then bucket sort takes Theta(n) time.  How
did we get around the Omega(n log n) lower bound on comparison-based sorting?
Bucket sort is not comparison-based.  We are making a q-way decision every time
we decide which queue to put an item into, instead of the true/false decisions
provided by comparisons and "if" statements.

Bucket sort (as I've described it here) is said to be _stable_.  A sort is
stable if items with equal keys come out in the same order they went in.  For
example, observe that 3:c, 3:e, and 3:i appear in the same order in the output
above as they appeared in the input.  Bucket sort is not the only stable sort
we have seen; insertion sort, selection sort, and mergesort can all be
implemented so that they are stable.  The linked list version of quicksort we
have seen can be stable, but the array version is decidedly not.  Heapsort is
never stable.  (Actually, we can _make_ heapsort stable using a simple trick
called a _secondary_key_, which I might describe later in the semester.)

Take note that bucket sort is ONLY appropriate when keys are distributed in
a small range; i.e. q is in O(n).  On Monday we'll study a sorting algorithm
called _radix_sort_ that will fix that limitation.  The stability of bucket
sort will be important for radix sort.

Counting Sort
-------------
If the items we sort are naked keys, with no associated values, bucket sort
can be simplified to become _counting_sort_.  In counting sort, we use no
queues at all; we need merely keep a count of how many copies of each key we
have encountered.  Suppose we sort 6 7 3 0 3 1 5 0 3 7:

               0       1       2       3       4       5       6       7
           -----------------------------------------------------------------
    counts |   2   |   1   |   0   |   3   |   0   |   1   |   1   |   2   |
           -----------------------------------------------------------------

When we are finished counting, it is straightforward to reconstruct the sorted
keys from the counts:  0 0 1 3 3 3 5 6 7 7.

Counting Sort with Complete Items
---------------------------------
Now let's go back to the case where we have complete items (key plus associated
value).  We can use a more elaborate version of counting sort.  The trick is to
use the counts to find the right index to move each item to.

Let x be an input array of objects with keys (and perhaps other information).

        0      1      2      3      4      5      6      7      8      9   
    -----------------------------------------------------------------------
  x |   .  |   .  |   .  |   .  |   .  |   .  |   .  |   .  |   .  |   .  |
    ----|------|------|------|------|------|------|------|------|------|---
        v      v      v      v      v      v      v      v      v      v   
      -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
      | 6 |  | 7 |  | 3 |  | 0 |  | 3 |  | 1 |  | 5 |  | 0 |  | 3 |  | 7 |
      -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

Begin by counting the keys in x.

  for (i = 0; i < x.length; i++) {
    counts[x[i].key]++;
  }

Next, do a _scan_ of the "counts" array so that counts[i] contains the number
of keys _less_than_ i.

               0       1       2       3       4       5       6       7
           -----------------------------------------------------------------
    counts |   0   |   2   |   3   |   3   |   6   |   6   |   7   |   8   |
           -----------------------------------------------------------------

  total = 0;
  for (j = 0; j < counts.length; j++) {
    c = counts[j];
    counts[j] = total;
    total = total + c;
  }

Let y be the output array, where we will put the sorted objects.  counts[i]
tells us the first index of y where we should put items with key i.  Walk
through the array x and copy each item to its final position in y.  When you
copy an item with key k, you must increment counts[k] to make sure that the
next item with key k goes into the next slot.

  for (i = 0; i < x.length; i++) {
    y[counts[x[i].key]] = x[i];
    counts[x[i].key]++;
  }

      ---------------------           ---------------------------------
    y |.|.|.|.|.|.|.|.|.|.|    counts | 0 | 2 | 3 | 3 | 6 | 6 | 8 | 8 |
      ---------------|-----           ---------------------------------
                     v
                     6

      ---------------------           ---------------------------------
    y |.|.|.|.|.|.|.|.|.|.|    counts | 0 | 2 | 3 | 3 | 6 | 6 | 8 | 9 |
      ---------------|-|---           ---------------------------------
                     v v
                     6 7

      ---------------------           ---------------------------------
    y |.|.|.|.|.|.|.|.|.|.|    counts | 0 | 2 | 3 | 4 | 6 | 6 | 8 | 9 |
      -------|-------|-|---           ---------------------------------
             v       v v
             3       6 7

      ---------------------           ---------------------------------
    y |.|.|.|.|.|.|.|.|.|.|    counts | 1 | 2 | 3 | 4 | 6 | 6 | 8 | 9 |
      -|-----|-------|-|---           ---------------------------------
       v     v       v v
       0     3       6 7

      ---------------------           ---------------------------------
    y |.|.|.|.|.|.|.|.|.|.|    counts | 1 | 2 | 3 | 5 | 6 | 6 | 8 | 9 |
      -|-----|-|-----|-|---           ---------------------------------
       v     v v     v v
       0     3 3     6 7

      ---------------------           ---------------------------------
    y |.|.|.|.|.|.|.|.|.|.|    counts | 1 | 3 | 3 | 5 | 6 | 6 | 8 | 9 |
      -|---|-|-|-----|-|---           ---------------------------------
       v   v v v     v v
       0   1 3 3     6 7

...

      ---------------------           ----------------------------------
    y |.|.|.|.|.|.|.|.|.|.|    counts | 2 | 3 | 3 | 6 | 6 | 7 | 8 | 10 |
      -|-|-|-|-|-|-|-|-|-|-           ----------------------------------
       v v v v v v v v v v
       0 0 1 3 3 3 5 6 7 7

Bucket sort and counting sort both take O(q + n) time.  If q is in O(n), then
they take O(n) time.  If you're sorting an array, counting sort is slightly
faster and takes less memory than bucket sort, though it's a little harder to
understand.  If you're sorting a linked list, bucket sort is more natural,
because you've already got listnodes ready to put into the buckets.

However, if q is not in O(n)--there are many more _possible_values_ for keys
than keys--we need a more aggressive method to get linear-time performance.
What do we do if q >> n?

Radix Sort
----------
Suppose we want to sort 1,000 items in the range from 0 to 99,999,999.  If we
use bucket sort, we'll spend so much time initializing and concatenating empty
queues we'll wish we'd used selection sort instead.

Instead of providing 100 million buckets, let's provide q = 10 buckets and sort
on the first digit only.  (A number less than 10 million is said to have a
first digit of zero.)  We use bucket sort or counting sort, treating each item
as if its key is the first digit of its true key.

        0      1      2      3      4      5      6      7      8      9   
    -----------------------------------------------------------------------
    |   .  |   .  |   *  |   .  |   *  |   .  |   .  |   .  |   *  |   .  |
    ----|------|-------------|-------------|------|------|-------------|---
        v      v             v             v      v      v             v   
     ------ ------        ------        ------ ------ ------        ------ 
     | 342| |1390|        |3950|        |5384| |6395| |7394|        |9362| 
     |9583| |5849|        |8883|        |2356| |1200| |2039|        |9193| 
     ---|-- ------        ---|--        ------ ------ ---|--        ---|-- 
        v                    v                           v             v   
     ------               ------                      ------        ------ 
     |  59|               |3693|                      |7104|        |9993| 
     |2178|               |7834|                      |2114|        |3949| 
     ------               ------                      ------        ------ 

Once we've dealt all 1,000 items into ten queues, we could sort each queue
recursively on the second digit; then sort the resulting queues on the third
digit, and so on.  Unfortunately, this tends to break the set of input items
into smaller and smaller subsets, each of which will be sorted relatively
inefficiently.

Instead, we use a clever but counterintuitive idea:  we'll keep all the numbers
together in one big pile throughout the sort; but we'll sort on the _last_
digit first, then the next-to-last, and so on up to the most significant digit.

The reason this idea works is because bucket sort and counting sort are stable.
Hence, once we've sorted on the last digit, the numbers 55,555,552 and
55,555,558 will remain ever after in sorted order, because their other digits
will be sorted stably.  Consider an example with three-digit numbers:

Sort on 1s:    771 721 822 955 405   5 925 825 777  28 829
Sort on 10s:   405   5 721 822 925 825  28 829 955 771 777
Sort on 100s:    5  28 405 721 771 777 822 825 829 925 955

After we sort on the middle digit, observe that the numbers are sorted by their
last two digits.  After we sort on the most significant digit, the numbers are
completely sorted.

Returning to our eight-digit example, we can do better than sorting on one
decimal digit at a time.  With 1,000 keys, sorting would likely be faster if
we sort on two digits at a time (using a base, or _radix_, of q = 100) or even
three (using a radix of q = 1,000).  Furthermore, there's no need to use
decimal digits at all; on computers, it's more natural to choose a power-of-two
radix like q = 256.  Base-256 digits are easier to extract from a key, because
we can quickly pull out the eight bits that we need by using bit operators
(which you'll study in detail in CS 61C).

Note that q is both the number of buckets we're using to sort, and the radix of
the digit we use as a sort key during one pass of bucket or counting sort.
"Radix" is a synonym for the base of a number, hence the name "radix sort."

How many passes must we perform?  Each pass inspects log2 q bits of each key.
If all the keys can be represented in b bits, the number of passes is
ceiling(b / log2 q).  So the running time of radix sort is in

                         b
  O( (n + q) ceiling( ------ ) ).
                      log  q
                         2

How should we choose the number of queues q?  Let's choose q to be in O(n), so
each pass of bucket sort or counting sort takes O(n) time.  However, we want
q to be large enough to keep the number of passes small.  Therefore, let's
choose q to be approximately n.  With this choice, the number of passes is in
O(1 + b / log2 n), and radix sort takes

          b
  O(n + ----- n) time.
        log n

For many kinds of keys we might sort (like ints), b is technically a constant,
and radix sort takes linear time.  Even if the key length b tends to grow
logarithmically with n (a reasonable model in many applications), radix sort
runs in time linear in the total number of bits in all the keys together.

A practical, efficient choice is to make q equal to n rounded down to the next
power of two.  If we want to keep memory use low, however, we can make q equal
to the square root of n, rounded to the nearest power of two.  With this
choice, the number of buckets is far smaller, but we only double the number of
passes.

Postscript:  Radix Sort Rocks (not examinable)
-----------------------------
Linear-time sorts tend to get less attention than comparison-based sorts in
most computer science classes and textbooks.  Perhaps this is because the
theory behind linear-time sorts isn't as interesting as for other algorithms.
Nevertheless, the library sort routines for machines like Crays use radix sort,
because it kicks major ass in the speed department.

Radix sort can be used not only with integers, but with almost any data that
can be compared bitwise, like strings.  The IEEE standard for floating-point
numbers is designed to work with radix sort combined with a simple prepass and
postpass (to flip the bits, except the sign bit, of each negative number).

Strings of different lengths can be sorted in time proportional to the total
length of the strings.  A first stage sorts the strings by their lengths.  A
second stage sorts the strings character by character (or several characters at
a time), starting with the last character of the longest string and working
backward to the first character of every string.  We don't sort every string
during every pass of the second stage; instead, a string is included in a pass
only if it has a character in the appropriate place.

For instance, suppose we're sorting the strings CC, BA, CCAAA, BAACA, and
BAABA.  After we sort them by length, the next three passes sort only the last
three strings by their last three characters, yielding CCAAA BAABA BAACA.  The
fifth pass is on the second character of each string, so we prepend the
two-character strings to our list, yielding CC BA CCAAA BAABA BAACA.  After
sorting on the second and first characters, we end with

  BA BAABA BAACA CC CCAAA.

Observe that BA precedes BAABA and CC precedes CCAAA because of the stability
of the sort.  That's why we put the two-character strings before the five-
character strings when we began the fifth pass.