Study guide for the final exam Here is a list of topics we covered in class, to help you study for the final exam. Analysis of algorithms - How to prove algorithms correct: invariants, proof by induction - O(.) notation and asymptotic running time - Definition of O(.); how to prove that f(n) = O(g(n)) - Theta(.) notation - Worst-case running time vs expected running time - You should be prepared to analyze the running time of an algorithm and express its asymptotic worst-case running time using O(.) notation Recurrence relations - How to express the running time of an iterative or recursive algorithm, using a recurrence relation - The master theorem. Suppose T(n) = a T(n/b) + Theta(n^c). If c > log_b a: T(n) = Theta(n^c). If c = log_b a: T(n) = Theta(n^c lg n). If c < log_b a: T(n) = Theta(n^{log_b a}). (Or, you can replace every Theta(.) by O(.) and it's also true.) - How to solve a recurrence relation: the recursion tree; finding the depth of the tree, the number of leaves, the extra work at each level If T(n) = a T(n/b) + n^c, depth is log_b a, number of leaves is O(n^{log_b a}), extra work at root is n^c, and so on downward. Divide-and-conquer algorithms - Some examples: e.g., can multiply two n-bit integers be done in O(n^{log_2 3}) time, mergesort can sort n numbers in O(n lg n) time - Randomized divide-and-conquer algorithms: e.g., can find the median in linear expected time - You should know how to design a divide-and-conquer algorithm, for some problem you haven't seen before Basic graph algorithms - Graph representations, particularly adjacency-list representation. Space needed to store adjacency-list representation of a graph, as a function of |V| and |E|. What linear-time means, for a graph algorithm. - Depth-first search - Breadth-first search - Reachability using DFS or BFS - How to find connected components in an undirected graph - Assigning pre/post numbers for DFS - Classification of edges in undirected graphs: tree edges, non-tree edges - Classification of edges in directed graphs: tree edges, forward edges, back edges, cross edges - How to use pre/post numbers to classify an edge type in O(1) time - How to test for existence of a cycle a directed graph, using DFS - Topological sorting (linearization) in linear-time (e.g., sort vertices by decreasing post number) - Definition of strongly connected components: a pair of vertices u,v are strongly connected if there is a path from u to v and a path from v to u; a strongly connected component is a maximal set of vertices that are all strongly connected with each other - Decomposing a directed graph into strongly connected components in linear time - How to construct the metagraph of a directed graph. The metagraph is a DAG. - You should be prepared to know how you can use these tools (DFS, topological sorting, strongly connected components) to help design an algorithm for some a graph problem that you haven't seen before, and when to use these tools Shortest-path algorithms - Definition of distance between two vertices - Breadth-first search: shortest paths when all edges have length 1 - Dijkstra's algorithm: shortest paths, when edges have non-negative lengths - Proof of correctness for Dijkstra's algorithm - Bellman-Ford algorithm: shortest paths, when edges have integer lengths (but there are no negative cycles) - How to test for the existence of negative cycles - Shortest paths in a DAG, when edges have integer lengths, in linear time - When to use different algorithms - Running time of BFS, Dijkstra's, and Bellman-Ford algorithms Greedy algorithms - Definition of a minimal spanning tree - Prim's algorithm for finding MSTs - Kruskal's algorithm for finding MSTs - Proof of correctness of Prim's and Kruskal's algorithms. The cut property. - Running time of Prim's and Kruskal's algorithm - Union-find data structure. Union by rank. Path compression. Amortized running time is O((n+m) log^* n), for a sequence of operations with n MakeSets and m Unions/Finds. - Horn clauses: how to solve HornSAT in linear time. Only set a variable to true when you're forced to. - Huffman coding: pick the two least-frequent symbols, make them siblings, and make a new fake "symbol" whose frequency is the sum of their two frequencies. When to use Huffman coding. - You should know how to design a greedy algorithm, for some problem you haven't seen before Dynamic programming - Be familiar with some examples of dynamic programming algorithms: e.g., longest increasing subsequence, edit distance, knapsack problem (with and without repetition) - Memoization - Differences between divide-and-conquer vs dynamic programming - How to identify subproblems. Common types of subproblems: prefixes, suffixes, substrings, subtrees. - Given a problem you haven't seen before, you should know how to tell whether dynamic programming might be applicable; and if it is, be prepared to design a dynamic programming algorithm - You should know how to write a recurrence relation for a dynamic programming problem, and then know how to turn that into an efficient algorithm Linear programming - Definition of the linear programming problem. maximize cx subject to Ax<=b. - The simplex algorithm (you only need to know the highest-level intuition about how it works: hill-climbing along intersection points in the n-dimensional polyhedron) - Running time of the simplex algorithm, and of the best LP solvers - Given a problem you haven't seen before, you should be prepared to devise a way to express it as a linear program (if applicable) and solve it using a LP solver Network flow - Definition of the maximum flow - Definition of residual graph - Ford-Fulkerson algorithm: pick a path in residual graph from s to t, push as much flow along that path as you can - Definition of a cut, the flow across a cut, and the capacity of a cut - Max-flow min-cut theorem - Algorithms for the min-cut problem - How to express the maximum-flow problem as a linear program - Tradeoffs between solving network flow problems using a dedicated network flow algorithm (such as Ford-Fulkerson) vs using a LP solver - Bipartite matching - Given a problem that you haven't seen before that can be solved with the help of a network flow algorithm, you should be prepared to explain how to cast it as a network flow problem P vs NP - Definition of polynomial-time algorithm: one whose worst-case asymptotic running time is polynomial in the length of the input, i.e., is O(n^c) for some constant c, where n = length of the input - Definition of a search problem: a problem of the form Given: x Find: s such that R(x,s)=true, or report that none exists where R is an algorithm specified in advance whose worst-case running time is polynomial in the length of x. The function R defines the search problem. - Definition of P and NP: NP = the set of all search problems. P = the set of all search problems that can be solved in polytime (i.e., where there is an algorithm to solve it whose worst-case running time is polynomial in the size of the input). - Definition of NP-complete and NP-hard: The problem B is NP-hard if for every problem A in NP, we have A <=_P B (i.e., A reduces to B). B is NP-complete if it is NP-hard and it is in NP. - Reductions - Cook-Levin Theorem (CircuitSAT is NP-complete) - Some standard problems in P: 2-coloring, 2-SAT, HornSAT, perfect bipartite matching - Some standard NP-complete problems: CircuitSAT, SAT, 3SAT, 3-coloring, Travelling Salesman Problem, Hamiltonian Cycle (aka Rudrata Cycle), Hamiltonian Path (aka Rudrata Path), Integer Linear Programming, longest simple path - How to prove a problem is NP-complete - The P =? NP question - Given a search problem, you should be prepared to either prove it is in P (by exhibiting a polynomial-time algorithm) or prove it is NP-complete (by reducing some other problem already known to be NP-complete to it) - You should be prepared to identify implications among complexity assumptions: e.g., if 3SAT is in P, what can we conclude about P vs NP? is it likely that there is an easy polynomial-time algorithm for 4-coloring? is it likely that there is an easy way to reduce 3-coloring to 2-coloring? Strategies for dealing with NP-hard problems - Intelligent brute-force search: backtracking, branch-and-bound - SAT solvers - Approximation algorithms - Definition of the approximation ratio - Heuristics - You should be prepared to explain how to formulate a particular problem in a way that it could be solved with a SAT solver, or prove whether or not a particular NP-hard optimization problem has an approximation algorithm with a specified approximation ratio Study tips: - If you haven't been reading the sample solution sets, you may want to check that you understand all the homework problems and how to solve them. It's likely that there will be one question on the final that is similar to a homework question. Also you may want to check that you understand all of the quiz problems and how to solve them. - You could get your study group together for a mock exam: have each participant makes up 2-3 sample questions in advance and bring copies of them to the study group meeting. Then you can all work through the solutions together, to get practice. - Get a good night's sleep on Sunday night before the exam! Really. - Don't stress too much over it. If you've been paying attention in class and doing the homeworks, you should be fine. And remember, it's just a stupid final exam -- there's more to life than exams.