ASYMPTOTIC ANALYSIS (bounds on running time or memory)
===================
Suppose an algorithm for processing a retail store's inventory takes:
  - 10,000 milliseconds to read the initial inventory from disk, and then
  - 10 milliseconds to process each transaction (items acquired or sold).
Processing n transactions takes (10,000 + 10 n) ms.  Even though 10,000 >> 10,
we sense that the "10 n" term will be more important if the number of
transactions is very large.

We also know that these coefficients will change if we buy a faster computer or
disk drive, or use a different language or compiler.  We want a way to express
the speed of an algorithm independently of a specific implementation on a
specific machine--specifically, we want to ignore constant factors (which get
smaller and smaller as technology improves).

Big-Oh Notation (upper bounds on a function's growth)
---------------
Big-Oh notation compares how quickly two functions grow as n -> infinity.

Let n be the size of a program's _input_ (in bits or data words or whatever).
Let T(n) be a function.  For now, T(n) is the algorithm's precise running time
  in milliseconds, given an input of size n (usually a complicated expression).
Let f(n) be another function--preferably a simple function like f(n) = n.

We say that T(n) is in O( f(n) )  IF AND ONLY IF   T(n) <= c f(n)
                                  WHENEVER n IS BIG, FOR SOME LARGE CONSTANT c.

 *  HOW BIG IS "BIG"?  Big enough to make T(n) fit under c f(n).
 *  HOW LARGE IS c?  Large enough to make T(n) fit under c f(n).

EXAMPLE:  Inventory
-------------------
Let's consider the function T(n) = 10,000 + 10 n, from our example above.
Let's try out f(n) = n, because it's simple.  We can choose c as large as we
want, and we're trying to make T(n) fit underneath c f(n), so pick c = 20.

                     c f(n) = 20 n     **
           ^                /        **
           |       |       /       **
           |       |      /      **
           |       |     /     **
           |       |    /    **  T(n) = 10,000 + 10 n
   30,000  +       |   /   **
           |       |  /  **
           |       | / **
           |       |/**
   20,000  +       **
           |     **|
           |   **/ |
           | ** /  |
   10,000  **  /   |
           |  /    |
           | /     |
           |/      |
           O-------+------------------------> n
                 1,000

As these functions extend forever to the right, their asymptotes will never
cross again.  For large n--any n bigger than 1,000, in fact--T(n) <= c f(n).
                 ***  THEREFORE, T(n) is in O(f(n)).  ***

Next, you must learn how to express this idea rigorously.  Here is the
central lesson of today's lecture, which will bear on your entire career as
a professional computer scientist, however abstruse it may seem now:

|=============================================================================|
| FORMALLY:  O(f(n)) is the SET of ALL functions T(n) that satisfy:           |
|                                                                             |
|   There exist positive constants c and N such that, for all n >= N,         |
|                              T(n) <= c f(n)                                 |
|=============================================================================|

Pay close attention to c and N.  In the graph above, c = 20, and N = 1,000.

Think of it this way:  if you're trying to prove that one function is
asymptotically bounded by another [f(n) is in O(g(n))], you're allowed to
multiply them by positive constants in an attempt to stuff one underneath the
other.  You're also allowed to move the vertical line (N) as far to the right
as you like (to get all the crossings onto the left side).  We're only
interested in how the functions behave as n shoots off toward infinity.

EXAMPLES:  Some Important Corollaries
-------------------------------------
[1]  1,000,000 n  is in  O(n).                Proof:  set c = 1,000,000, N = 0.
  -> Therefore, Big-Oh notation doesn't care about (most) constant factors.
     We generally leave constants out; it's unnecessary to write O(2n),
     because O(2n) = O(n).  (The 2 is not wrong; just unnecessary.)

[2]  n  is in  O(n^3).  [That's n cubed].             Proof:  set c = 1, N = 1.
  -> Therefore, Big-Oh notation can be misleading.  Just because an algorithm's
     running time is in O(n^3) doesn't mean it's slow; it might also be in
     O(n).  Big-Oh notation only gives us an UPPER BOUND on a function.

              c f(n) = n^3
           ^        *      /
           |        *     /
           |        *    / T(n) = n
           |        *   /
           |        *  /
           |        * /
           |       * /
           |       */
       1   +       *
           |      /*
           |     / *
           |    / *|
           |   /  *|
           |  /   *|
           | /   * |
           |/  **  |
           O***----+------------------------> n
                 N = 1

[3]  n^3 + n^2 + n  is in  O(n^3).                    Proof:  set c = 3, N = 1.
  -> Big-Oh notation is usually used only to indicate the dominating (largest
     and most displeasing) term in the function.  The other terms become
     insignificant when n is really big.

     Here's a table to help you figure out the dominating term.

Table of Important Big-Oh Sets
------------------------------
Arranged from smallest to largest, happiest to saddest, in order of increasing
domination:

                      function              common name
                      --------              -----------
                   O(     1     )       ::  constant
    is a subset of O(   log n   )       ::  logarithmic
    is a subset of O(  log^2 n  )       ::  log-squared [that's (log n)^2 ]
    is a subset of O(  root(n)  )       ::  root-n [that's the square root]
    is a subset of O(     n     )       ::  linear
    is a subset of O(  n log n  )       ::  n log n
    is a subset of O(    n^2    )       ::  quadratic
    is a subset of O(    n^3    )       ::  cubic
    is a subset of O(    n^4    )       ::  quartic
    is a subset of O(    2^n    )       ::  exponential
    is a subset of O(    e^n    )       ::  exponential (but more so)

Algorithms that run in O(n log n) time or faster are considered efficient.
Algorithms that take n^7 time or more are usually considered useless.  In the
region between n log n and n^7, the usefulness of an algorithm depends on the
typical input sizes and the associated constants hidden by the Big-Oh notation.

If you're not thoroughly comfortable with logarithms, read Sections 4.1.2 and
4.1.7 of Goodrich & Tamassia carefully.  Computer scientists need to know
logarithms in their bones.

Warnings
--------
[1]  Here's a fallacious proof:

     n^2  is in  O(n), because if we choose c = n, we get n^2 <= n^2.
  -> WRONG!  c must be a constant; it cannot depend on n.

[2]  The big-Oh notation expresses a relationship between functions.
     IT DOES NOT SAY WHAT THE FUNCTIONS MEAN.  In particular, the function on
     the left does not need to be the worst-case running time, though it often
     is.  The number of emails you send to your Mom as a function of time might
     be in O(t^2).  In that case, not only are you a very good child; you're an
     increasingly good child.

     In binary search on an array,

     - the worst-case running time is in O(log n),
     - the best-case running time is in O(1),
     - the memory use is in O(n), and
     - 47 + 18 log n - 3/n is in O(the worst-case running time).

     Every semester, a few students get the wrong idea that "big-Oh" always
     means "worst-case running time."  Their brains short out when an exam
     question uses it some other way.

[3]  "e^3n is in O(e^n) because constant factors don't matter."
     "10^n is in O(2^n) because constant factors don't matter."
  -> WRONG!  I said that Big-Oh notation doesn't care about (most) constant
     factors.  Here are some of the exceptions.  A constant factor in an
     exponent is not the same as a constant factor in front of a term.
     e^3n is not bigger than e^n by a constant factor; it's bigger by a factor
     of e^2n, which is damn big.  Likewise, 10^n is bigger than 2^n by a factor
     of 5^n.

[4]  Big-Oh notation doesn't tell the whole story, because it leaves out the
     constants.  If one algorithm runs in time T(n) = n log_2 n, and another
     algorithm runs in time U(n) = 100 n, then Big-Oh notation suggests you
     should use U(n), because T(n) dominates U(n) asymptotically.  However,
     U(n) is only faster than T(n) in practice if your input size is greater
     than current estimates of the number of subatomic particles in the
     universe.  The base-two logarithm log_2 n < 50 for any input size n you
     are ever likely to encounter.

     Nevertheless, Big-Oh notation is still a good rule of thumb, because the
     hidden constants in real-world algorithms usually aren't that big.

ASYMPTOTIC ANALYSIS (continued):  More Formalism
================================================

|-----------------------------------------------------------------------------|
| Omega(f(n)) is the set of all functions T(n) that satisfy:                  |
|                                                                             |
|   There exist positive constants d and N such that, for all n >= N,         |
|                              T(n) >= d f(n)                                 |
|-----------------------------------------------------------------------------|
^^^^^^^^^^  Compare with the definition of Big-Oh:  T(n) <= c f(n). ^^^^^^^^^^^

Omega is the reverse of Big-Oh.  If T(n) is in O(f(n)), f(n) is in Omega(T(n)).

  2n   is in Omega(n)                 BECAUSE    n                is in O(2n).
  n^2  is in Omega(n)                 BECAUSE    n                is in O(n^2).
  n^2  is in Omega(3 n^2 + n log n)   BECAUSE    3 n^2 + n log n  is in O(n^2).

Big-Omega gives us a LOWER BOUND on a function, just as Big-Oh gives us an
UPPER BOUND.  Big-Oh says, "Your algorithm is at least this good."  Big-Omega
says, "Your algorithm is at least this bad."

Recall that Big-Oh notation can be misleading because, for instance,
n is in O(n^8).  If we know both a lower bound and an upper bound for
a function, and they're both the same bound asymptotically (i.e. they differ
only by a constant factor), we can use Big-Theta notation to precisely specify
the function's asymptotic behavior.

|-----------------------------------------------------------------------------|
| Theta(f(n)) is the set of all functions that are in both of                 |
|                                                                             |
|                         O(f(n)) and Omega(f(n)).                            |
|-----------------------------------------------------------------------------|

But how can a function be sandwiched between f(n) and f(n)?
Easy:  we choose different constants (c and d) for the upper bound and lower
bound.  For instance, here is a function T(n) in Theta(n):

                     c f(n) = 10 n
           ^                /
           |               /                T(n)
           |              /               **
           |             /               *  *
           |            /   ***          *   **
           |           /   *   *         *
           |   ***    /   *     *       *
           | **   ** /    *     *       *
           |*       **   *       *      *
           *       /  *  *        **   *
           |      /    **           ***     ~~~
           |     /                     ~~~~~
           |    /                 ~~~~~
           |   /             ~~~~~
           |  /         ~~~~~     d f(n) = 2 n
           | /     ~~~~~
           |/ ~~~~~
           O~~------------------------------> n

If we extend this graph infinitely far to the right, and find that T(n) remains
always sandwiched between 2n and 10n, then T(n) is in Theta(n).  If T(n) is an
algorithm's worst-case running time, the algorithm will never exhibit worse
than linear performance, but it can't be counted on to exhibit better than
linear performance, either.

Theta is symmetric:  if f(n) is in Theta(g(n)), then g(n) is in Theta(f(n)).
For instance, n^3 is in Theta(3 n^3 - n^2), and 3 n^3 - n^2 is in Theta(n^3).
n^3 is not in Theta(n), and n is not in Theta(n^3).

Big-Theta notation isn't potentially misleading in the way Big-Oh notation
can be:  n is NOT in Omega(n^8).  If your algorithm's running time is in
Theta(n^8), it IS slow.

However, some functions are not in "Theta" of anything simple.  For example,
the function f(n) = n (1 + sin n) is in O(n) and Omega(0), but it's not in
Theta(n) nor Theta(0).  f(n) keeps oscillating back and forth between zero and
ever-larger numbers.  We could say that f(n) is in Theta(2n (1 + sin n)), but
that's not a simplification.

Remember that the choice of O, Omega, or Theta is _independent_ of whether
we're talking about worst-case running time, best-case running time,
average-case running time, memory use, annual beer consumption as a function of
population, or some other function.  The function has to be specified.
"Big-Oh" is NOT a synonym for "worst-case running time," and Omega is not a
synonym for "best-case running time."

ALGORITHM ANALYSIS
==================
Problem #1:  Given a set of p points, find the pair closest to each other.
Algorithm #1:  Calculate the distance between each pair; return the minimum.

  There are p (p - 1) / 2 pairs, and each pair takes constant time to examine.
  Therefore, worst- and best-case running times are in Theta(p^2).

  Often, you can figure out the running time of an algorithm just by looking at
  the loops--their loop bounds and how they are nested.  For example, in the
  closest pair code below, the outer loop iterates p times, and the inner loop
  iterates an average of roughly p / 2 times, which multiply to Theta(p^2)
  time.

  double minDistance = point[0].distance(point[1]);

  /* Visit a pair (i, j) of points. */
  for (int i = 0; i < numPoints; i++) {
    /* We require that j > i so that each pair is visited only once. */
    for (int j = i + 1; j < numPoints; j++) {
      double thisDistance = point[i].distance(point[j]);
      if (thisDistance < minDistance) {
        minDistance = thisDistance;
      }
    }
  }

  But doubly-nested loops don't always mean quadratic running time!  The next
  example has the same loop structure, but runs in linear time.

Problem #2:  Smooshing an array called "ints" to remove consecutive duplicates.
Algorithm #2:

  int i = 0, j = 0;

  while (i < ints.length) {
    ints[j] = ints[i];
    do {
      i++;
    } while ((i < ints.length) && (ints[i] == ints[j]));
    j++;
  }
  // Code to fill in -1's at end of array omitted.

The outer loop can iterate up to ints.length times, and so can the inner loop.
But the index "i" advances on _every_ iteration of the inner loop.  It can't
advance more than ints.length times before both loops end.  So the worst-case
running time of this algorithm is in Theta(ints.length).  (So is the best-case
time.)

Unfortunately, I can't give you a foolproof formula for determining the running
time of any algorithm.  You have to think!  In fact, the problem of determining
an algorithm's running time is, in general, as hard as proving _any_
mathematical theorem.  For instance, I could give you an algorithm whose
running time depends on whether the Riemann Hypothesis (one of the greatest
unsolved questions in mathematics) is true or false.

Functions of Several Variables
------------------------------
Problem #3:  Write a matchmaking program for w women and m men.
Algorithm #3:  Compare each woman with each man.  Decide if they're compatible.

  If each comparison takes constant time then the running time, T(w, m),
  is in Theta(wm).

  This means that there exist constants c, d, W, and M, such that
  d wm <= T(w, m) <= c wm  for every w >= W and m >= M.

  T is NOT in O(w^2), nor in O(m^2), nor in Omega(w^2), nor in Omega(m^2).
  Every one of these possibilities is eliminated either by choosing
  w >> m or m >> w.  Conversely, w^2 is in neither O(wm) nor Omega(wm).
  You cannot asymptotically compare the functions wm, w^2, and m^2.

  If we expand our service to help form women's volleyball teams as well,
  the running time is in Theta(w^6 + wm).

  This expression cannot be simplified; neither term dominates the other.
  You cannot asymptotically compare the functions w^6 and wm.

Problem #4:  Suppose you have an array containing n music albums, sorted by
             title.  You request a list of all albums whose titles begin with
             "The Best of"; suppose there are k such albums.
Algorithm #4:  Search for one matching album with binary search.
               Walk (in both directions) to find the other matching albums.

  Binary search takes at most log n steps to find a matching album (if one
  exists).  Next, the complete list of k matching albums is found, each in
  constant time.  Thus, the worst-case running time is in

    Theta(log n + k).

  Because k can be as large as n, it is not dominated by the log n term.
  Because k can be as small as zero, it does not dominate the log n term.
  Hence, there is no simpler expression for the worst-case running time.

  Algorithms like this are called _output-sensitive_, because their performance
  depends partly on the size k of the output, which can vary greatly.

  Because binary search sometimes gets lucky and finds a match right away, the
  BEST-case running time is in

    Theta(k).

Problem #5:  Find the k-th item in an n-node doubly-linked list.
Algorithm #5:  If k < 1 or k > n, report an error and return.
               Otherwise, compare k with n - k.

               If k <= n - k, start at the front of the list and walk forward
               k - 1 nodes.

               Otherwise, start at the back of the list and walk backward
               n - k nodes.

  If 1 <= k <= n, this algorithm takes Theta(min{k, n - k}) time (in all cases)
  This expression cannot be simplified:  without knowing k and n, we cannot say
  that k dominates n - k or that n - k dominates k.