Lab 20: Graphs

## A. Intro

Download the code for Lab 20 and create a new Eclipse project out of it.

## B. The Abstract Graph

#### Linked lists --> Trees --> Graphs

One of the first data structures we studied in this course was the linked list, which consisted of a set of nodes connected in sequence. Then we looked at trees, which were a generalized version of linked lists: you still connected nodes in sequence, but one node could branch off leading to multiple others. Now we'll look at a generalization of a tree, called a graph.

In a graph, we still have a collection of nodes, but each node in a graph can be connected to any other node without limitation. This means there isn't necessarily a hierarchical structure like you get in a tree. For example, here's a tree, where every node is descendent from one another: Now suppose you add an edge back up the tree. This is no longer a tree (notice the hierarchical structure is broken — is C descendent from A or is A descendent from C?), but it is still a graph! There are other edges that are not allowed in a tree. For example, now node E looks like it is descendent from two nodes. Again this is no longer a tree, but it is a graph. In general a graph can have any possible connection between nodes: #### Overview of Graphs

First, let's go over some terminology. An element in a graph is called a vertex (basically a node, but we typically don't call them nodes). Vertices usually represent the objects in our graph, namely the things that have the relationships such as people, places, or things. A connection between two vertices is called an edge. An edge represents some kind of relationship between two vertices.

Quite a few examples of graphs exist in the everyday world:

• Road maps are a great example of graphs. Each city is a vertex, and the edges that connect these cities are the roads and freeways. An abstract example of what such a graph would look like can be found here. For a more local example, each building on the Berkeley campus can be thought of as a vertex, and the paths that connect those buildings would be the edges.
• Facebook depends on graphs. Each person that participates is a vertex, and an edge is established when two people claim to be friends or associates of each other.
• For a more technical example, a computer network is also a graph. In this case, computers and other network machinery (like routers and switches) are the vertices, and the edges are the network cables. Or for a wireless network, an edge is an established connection between a single computer and a single wireless router.

In more formal mathematical notation, a vertex is written as a variable, such as `v0`, `v1`, `v2`, etc. An edge is written as a pair of vertices, such as `(v0, v1)`, or `(v2, v0)`.

#### Directed vs. Undirected Graphs

Once again, an element in a graph is called a vertex, and a connection between two vertices is called an edge.

If all edges in a graph are showing a relationship between two vertices that works in either direction, then it is called an undirected graph. A picture of an undirected graph looks like this: But not all edges in graphs are the same. Sometimes the relationships between two vertices sometimes only go in one direction. Such a relationship is called a directed graph. An example of this could be a city map, where the edges are sometimes one-way streets between locations. A two-way street would actually have to be represented as two edges, one of them going from location A to location B, and the other from location B to location A.

In terms of a visual representation of a graph, an undirected graph does not have arrows on its edges (because the edge connects the vertices in both directions), whereas each edge in a directed graph does have an arrow that points in the direction the edge is going. An example directed graph appears below. More formally, an edge from a vertex `v0` to a vertex `v1` is printed as the pair (`v0, v1)`. In a directed graph, the pair is ordered; thus even if the edge `(v0,v1)` exists, `(v1,v0)` might not. In an undirected graph, the pair isn't ordered, so the edge `(v0,v1)` is the same as the edge `(v1,v0)`.

#### Discussion: Give Another Application of Graphs

Link to the discussion

Give another situation that can be modeled with graphs. Describe what the vertices are, and define the conditions under which two vertices are connected by an edge. Can your example be represented as an undirected graph, or does it have to be a directed graph?

#### A Few More Graph Definitions

Now that we have the basics of what a graph is, here are a few more terms that might come in handy while discussing graphs.

Term Definition
Edge A single connection between two vertices
Adjacent Two vertices are adjacent if there is an edge connecting them
Connected A graph is connected if every vertex has a path to all other vertices. (Also describes two nodes if there is an edge connecting them)
Neighbor Two vertices are neighbors if there is an edge connecting them
Set of neighbors The set of all nodes that are adjacent to/connected to/neighbors of a node
Incident to an edge A vertex that is an endpoint of an edge is incident to it
Path A sequence of edges that can be followed from one vertex to another
Cycle A special kind of path that ends at the same vertex where it originally started

#### Self-test: Edge Count vs. Vertex Count

Suppose that G is a directed graph with N vertices. What's the maximum number of edges that G can have? Assume that a vertex cannot have an edge pointing to itself, and that for each vertex u and vertex v, there is at most one edge (u,v).

 N Incorrect. Draw out a graph with 4 vertices. Can you find more than 4 edges between the nodes? N^2 Incorrect. But it would be correct if a vertex can have an edge to itself, which is the case for some kinds of graphs. N*(N-1) Correct! Every vertex can be connected to every other vertex, of which there are N-1. N*(N-1)/2 Incorrect. Remember the graph is directed .

Now suppose the same graph G in the above question is an undirected graph. Again assume that no vertex is adjacent to itself, and at most one edge connects any pair of vertices. What's the maximum number of edges that G can have?

 half as many edges as in the directed graph Correct! Remember the edges (u, v) and (v, u) now only count as one edge. the same number of edges as in the directed graph Incorrect. Remember the possible edges (u, v) and (v, u) now only count as one edge. twice as many edges as in the directed graph Incorrect. Remember the edges (u, v) and (v, u) now only count as one edge.

What's the minimum number of edges that a connected undirected graph with N vertices can have?

 N-1 Correct! (In fact, an undirected connected graph with N-1 edges is a tree!) N Incorrect. Think of the small case. How many edges do you need to connect a graph with two nodes? N^2 Incorrect. There aren't this many possible edges in an undirected graph. N*(N-1) Incorrect. There aren't this many possible edges in an undirected graph. N*(N-1)/2 Incorrect. This is all possible edges in the graph! Surely we can do better.

#### Data Structures for Graphs

Now that we know how to draw a graph on paper and understand the basic concepts and definitions, we can consider how a graph should be represented inside of a computer. We want to be able to get quick answers for the following questions about a graph:

• Are given vertices `v` and `w` adjacent?
• Is vertex `v` incident to a particular edge `e`?
• What vertices are adjacent to `v`?
• What edges are incident to `v`?

Most of today's lab will involve thinking about how fast and how efficient each of these operations is using different representations of a graph.

Imagine that we want to represent a graph that looks like this: One data structure we could use to implement this graph is called an array of adjacency lists. In such a data structure, an array is created that has the same size as the number of vertices in the graph. Each position in the array represents one of the vertices in the graph. Then, each location in the array points to a list that contains indexes to other vertices, which are the vertices neighbors. These lists are called adjacency lists.

The array of adjacency lists that represents the above graph looks like this: Another data structure we could use to represent the edges in a graph is called an adjacency matrix. In this data structure, we also have an array with each position representing a vertex in the graph. However, instead of the array pointing to a linked list, it points to another array representing possible neighbors. This matrix just contains boolean values, true when there is an edge between the two given vertices, false when no edge exists. Thus, each vertex has a row and a column in the matrix, and the value in that table says true or false whether or not that edge exists.

The adjacency matrix that represents the above graph looks like this: #### Discussion: Representing a Graph with a Linked Structure

Link to the discussion

A third data structure we could use to represent a graph is simply an extension of the linked nodes idea, where each vertex is an object that contains pointers to other vertices. This may seem like the most straightforward approach: aren't the adjacency list and adjacency matrix roundabout in comparison?

Discuss reasons why the adjacency list or adjacency matrix might be preferred for a graph.

On the flipside, notice that we could also represent a tree using an ajacency matrix or list. Discuss reasons why an ajacency list or adjacency matrix might not be preferred for a tree.

#### Self-test: Which Data Structure is More Efficient?

Which is more efficient - an adjacency matrix or an array of adjacency lists?

 Adjacency Matrix Incorrect. It depends Correct! We'll explore why in the next few steps. Adjacency Lists Incorrect.

#### Self-test: Graph Data Structure Trade-offs

Deciding how to store data is all about trade-offs. This quiz focuses on the BIG picture of which representation is more efficient in a given situation. The rest of the lab focuses on exactly how efficient things are.

Which is most SPACE efficient if you have a lot of edges in your graph?

 Adjacency Matrix Correct! Most of the spaces in the adjacency matrix will be used, so it won't be a waste. Adjacency Lists Incorrect. You'll have to store a lot of links if you have a lot of edges. It depends Incorrect. At a general level one of these data structures is more space efficient if we know the graph has a lot of edges.

Which is most SPACE efficient if you have very few edges in your graph?

 Adjacency Matrix Incorrect. No matter how many edges the graph has, you still have to store a whole matrix for all of them. Adjacency Lists Correct! You only have to store as many nodes in the adjacency list as there are edges. With an adjacency matrix, you have to store the full matrix regardless of how many edges you have. It depends Incorrect. At a general level one of these data structures is more space efficient if the know the graph has few edges.

Which is most TIME efficient for adding an edge if you have a lot of edges in your graph?

 Adjacency Matrix Correct! All we have to do to add an edge is index into the arrays and set a value to true. Adjacency Lists Incorrect. Adding a new edge requires creating a new node to append to our list of edges. It depends Incorrect. They are the same Incorrect. They can both be O(1) if you append to the front of the linked lists, but the amount of work you have to do for each is noticeably different.

Which is most TIME efficient for adding an edge if you have very few edges in your graph?

 Adjacency Matrix Correct! Adding an edge is the same no matter how many edges there are. Adjacency Lists Incorrect. It depends Incorrect. They are the same Incorrect.

Which is most TIME efficient for returning a list of edges from one node if you have very few edges in your graph?

 Adjacency Matrix Incorrect. You'll have to iterate through the whole array and check which values are true. Adjacency Lists Correct! This will be a lot faster if it just has a reference it can return to the neighbors of a node. It depends Incorrect. They are the same Incorrect.

Which is most TIME efficient for returning a list of edges from one node if you have a lot of edges in your graph?

 Adjacency Matrix Incorrect. Adjacency Lists Correct! For the same reason as before. Just returning a reference to the linked list does not change if there are more nodes in the list. It depends Incorrect. They are the same Incorrect.

#### Self-test: Time Estimates for Accessing Graph Information

Using an adjacency matrix, how long in the worst case does it take to determine if vertex v is adjacent to vertex w? (Assume vertices are represented by integers.)

 constant time Correct! All we have to do is index into the matrix at v and w and check if the value is true. time proportional to the number of neighbors of vertex v Incorrect. Remember the matrix is a 2D array, and we can index into any spot in it in constant time time proportional to the number of vertices in the graph Incorrect. Remember the matrix is a 2D array, and we can index into any spot in it in constant time time proportional to the number of edges in the graph Incorrect. Remember the matrix is a 2D array, and we can index into any spot in it in constant time

Using an array of adjacency lists, how long in the worst case does it take to determine if vertex v is adjacent to vertex w? (Assume vertices are represented by integers.)

 constant time Incorrect. Checking if list of neighbors contains a particular vertex requires iterating through it. time proportional to the number of neighbors of vertex v Correct. We have to iterate through all of them and check if any of them are w. time proportional to the number of vertices in the graph Incorrect. We only have to iterate through the list of neighbors of v. time proportional to the number of edges in the graph Incorrect. We only have to iterate through the list of neighbors of v.

#### Self-test: Memory Use in Adjacency Lists

Suppose we are representing a graph with N vertices and E edges.

There are N2 booleans stored in an adjacency matrix. Therefore the memory required to store an adjacency matrix is N2 times the memory required to store a `boolean` value. Assume that references and integers each use 1 unit of memory.

The amount of memory required to represent the graph as an array of adjacency lists is proportional to what? N*E Incorrect. Do we have all E edges stored for every node? E*E Incorrect. N plus E Correct! Look at the graph. There are N array elements and E nodes among the linked lists. E Incorrect. Don't forget about the references we have to store in the array elements.

#### Discussion: Common Neighbor Timing

Link to the discussion

A CS 61B student claims that, in a graph with N vertices and E edges that's implemented with an array of adjacency lists, the worst-case time to see if vertices v and w have a common neighbor is proportional to N 2 . (Vertices are not in any particular order in an adjacency list.)

This estimate is insufficiently specific. Explain why, and give a more specific estimate.

## C. Programs that Process Graphs

#### Graph Traversal

Earlier in the course, we used the general traversal algorithm to process all elements of a tree:

``````Stack fringe = new Stack ();
fringe.push (myRoot);
while (!fringe.isEmpty()) {
// select a tree node from fringe
TreeNode node = fringe.pop();

// process the node
[do whatever processing operation here...]

// add node's children to fringe
fringe.push(node.myRight);
fringe.push(node.myLeft);
// Note: If this was a tree with more than
//       two children, we'd push ALL of the
//       children onto the stack.
}``````

The code just given returns tree values in depth-first order. If we wanted a breadth-first traversal of a tree, we'd replace the `Stack` with a queue (the `LinkedList` class in `java.util` would be fine).

Analogous code to process every vertex in a connected graph is

``````Stack stack = new Stack();
fringe.push (some vertex);
while (!fringe.isEmpty()) {
// select a vertex from fringe
// process the vertex
// add the vertex's neighbors to fringe
}``````

This doesn't quite work, however. A graph may contain a cycle of vertices, and if that is the case here, the code loops infinitely.

The fix is to keep track of vertices that we've visited already (and put into the fringe), in order not to process them twice. Here is pseudocode:

``````Stack fringe = new Stack ();
fringe.add (some vertex);
while (!fringe.isEmpty()) {
// select a vertex from fringe
// process the vertex
// add the vertex's unvisited neighbors to fringe
}``````

As with tree traversal, we can visit vertices in depth-first or breadth-first order merely by choosing a stack or a queue to represent the fringe. Typically though, because graphs are usually interconnected, the ordering of vertices can be scrambled up quite a bit. Thus, we don't worry too much about using a depth-first or a breadth-first traversal. Instead, in the next lab session we will see applications that use a priority queue to implement something more like best-first traversal.

#### Exercise: Practice Graph Traversal

Below is the psuedocode for traversing a graph. It might help you to print out copies of the graph here. Try to figure out the order in which the nodes are visited if you start at different vertices. You can check your answers for starting at vertices 1 - 5 on the next page.

``````Stack fringe = new Stack ();
fringe.add (some vertex);
while (!fringe.isEmpty()) {

// select a vertex from fringe
// if the vertex is not yet visited:
// process the vertex
// add the vertex's neighbors to fringe
// mark the vertex as visited``````

}

For this problem assume that we add the vertex's unvisited edges to the stack in the order higher number to lower number.  #### Answers for Practicing Graph Traversal

Here are the solutions for nodes 1-5 (just practice these until you feel comfortable).

Starting vertex Order of visiting remaining vertices
1 3, 4, 2, 5, 7, 6, 9, 8, 10
2 4, 3, 1, 6, 7, 5, 8, 9, 10
3 1, 4, 2, 5, 7, 6, 9, 8, 10
4 2, 5, 7, 3, 1, 6, 9, 8, 10
5 2, 4, 3, 1, 6, 7, 8, 9, 10

#### Exercise: `Graph.java`

We have given you framework code for a class `Graph`. It implements a graph of `Integer`s using an adjacency list.

Fill in some of the basic methods: `addEdge`, `addUndirectedEdge`, `isAdjacent`, and `inDegree`.

Then complete the `DFSIterator` inner class. As its name suggests, it should iterate through all of the vertices in the graph in DFS order, starting from a vertex that is passed in as an argument.

#### Exercise: Using an Iterator to Find a Path, Part 1

First, switch which partner is coding for this lab if you haven't already.

Complete the method `pathExists` in `Graph.java`, which returns whether or not any path exists that goes from a vertex `v` to a vertex `w`. Remember that a path is any set of edges that exists which you can follow that such you travel from one vertex to another. You may find it helpful for your method to call the given method `visitAll`.

For an example of an undirected graph this should work on, try testing it with the following two graphs:

``````addUndirectedEdge(0,2);
addUndirectedEdge(0,3);
addUndirectedEdge(1,4);
addUndirectedEdge(1,5);
addUndirectedEdge(2,3);
addUndirectedEdge(2,6);
addUndirectedEdge(4,5);

addEdge(0,1);
addEdge(1,2);
addEdge(2,0);
addEdge(2,3);
addEdge(4,2); ``````

#### Exercise: Using an Iterator to Find a Path, Part 2

Now you will actually find a path from one vertex to another if it exists. Write code in the body of the method named `path` that, given two `int`s that represent a start vertex and a finish vertex, returns an `ArrayList` of `Integer`s that represent vertices that lie on the path from the start to the finish. If no such path exists, you should return an empty `ArrayList`.

Algorithm hint: Pattern your method on `visitAll`, with the following differences. First, add code to stop calling `next` when you encounter the finish vertex. Then, trace back from the finish vertex to the start, by first finding a visited vertex u for which (u, finish) is an edge, then a vertex v visited earlier than u for which (v, u) is an edge, and so on, finally finding a vertex w for which (start, w) is an edge. Collecting all these vertices in the correct sequence produces the desired path. We recommend that you try this by hand with a graph or two to see that it works.

#### Topological Sort

A topological sort (sometimes also called a linearization) of a directed graph is a list of the vertices in such an order that if there is a directed path from vertex `v` to vertex `w`, then `v` precedes `w` in the list. (The graph must be acyclic in order for this to work. Directed acyclic graphs are common enough to be referred to by their acronym: DAGs.)

Here is an example of a DAG: In the above DAG, a few of the possible topological sorts could be:

``````A B C D E F G H
A C B E G D F H
B E G A D C F H``````

Notice that the topological sort for the above DAG has to start with either A or B, and must end with H. (For this reason, A and B are called sources, and H is called a sink.)

Another way to think about it is that the vertices in a DAG represent a bunch of tasks you need to complete on a to-do list. Some of those tasks cannot be completed until others are done. For example, when getting dressed in the morning, you may need to put on shoes and socks, but you can't just do them in any order. The socks must be put on before the shoes.

The edges in the DAG represent dependencies between the tasks. In this example above, that would mean that task A must be completed before tasks C and D, task B must be completed before tasks D and E, E before G, C and D before F, and finally F and G before H. A topoological sort gives you an order in which you can do the tasks (it puts the tasks in a linear order).

#### The Topological Sort Algorithm

The algorithm for taking a graph and finding a topological sort uses an array named `currentInDegree` with one element per vertex. `currentInDegree[v]` is initialized with the in-degree of each vertex `v`.

The algorithm also uses a fringe. The fringe is initialized with all vertices whose in-degree is 0. When a vertex is popped off the fringe and added to a results list, the `currentInDegree` value of its neighbors are reduced by 1. Then the fringe is updated again with vertices shows in-degree is now 0.

#### Exercise: Implement Topological Sort

Your task is to fill in the blanks in the `TopologicalIterator` class so that it successively returns vertices in topological order as described above. The `TopologicalIterator` class will resemble the `DFSIterator` class, except that it will use a `currentInDegree` array as described above, and instead of pushing unvisited vertices on the stack, it will push only vertices whose in-degree is 0.

## D. Conclusion

#### Submission

Submit as `lab20`:

• The completed Graph.java

In addition, please fill out this self-reflection form before this lab is due, as a part of your homework assignment. Self-reflection forms are to be completed individually, not in partnership.

#### Reading

For next lab, please read DSA chapter 13.5 (in Graphs).