SI335: Graph Algorithms (intro) & analysis revisited

Reading

Unit 5 Notes Sections 1.1,1.2.

Homework

Problem 1

Here's an adjacency list describing a graph. Draw it. Here's an adjacency matrix describing a graph. Draw it. Note: The two diagrams were given in class.

Problem 2

Analyse the worst-case running time (in terms of $n$ the number of vertices in $G$ and $k$ the length of $L$) for the following algorithm, once assuming the adjacency list representation, once assuming the adjacency matrix representation

checkTrail(G,n,L,k)
------------------
Input:  directed graph G on n vertices {0,1,...,n-1}, array L of k vertices
Output: true if L is a "trail", i.e. in going from each vertex to the
        next in L, we only follow valid edges.

check = true, i = 1, curr = L[0]
while i < k and check
  next = L[i]
  check = check if there's an edge from curr to next
  curr = next
  i = i + 1
return check

Problem 3

Suppose that we store the neighbors list in the adjacency list representation as a sorted array. What would your analysis look like for the previous algorithm in this variant of the adjacency list?

Problem 4

Modify checkTrail to make it "checkPath", where we also require that no vertex is visited more than once. Analyze the worst case of your modified algorithm (once again in both representations).

Problem 5

Let's describe the input size of G more precisely: it has n vertices and e edges. Do a worst-case analysis of checkPath (for both representations) using all three paramters (n,e,k) as the input "size".

Problem 6

A vertex is a "source" if there are no edges coming into it from other vertices. Design an algorithm that takes a graph (n vertices and e edges) and determines all sources in the graph. Do a worst-case analysis (in terms of $n$ and $e$) of your algorithm in both representations.

Problem 7

Consider the algorithm findCommon that takes a graph and an array L of k source vertices and determines whether there is a vertex that is a descendent of more than one of the given sources, returning such a vertex if one exists. Analyze the worst-case running time of this algorithm in terms of n (number of vertices) and e (number of edges) in both representations (i.e. adjacency list and matrix).

Problem 4 - A Solution

Here is a modification of checkTrail to give "checkPath", as asked for in Problem 4. Note that this version uses an adjacency list.

checkPath(G,n,L,k)
------------------
Input:  directed graph G on n vertices {0,1,...,n-1}, array L of k vertices
        NOTE: assume G is given as an adjacency list
        NOTE: assume k ≤ n, since we can't have a path otherwise
Output: true if L is a "trail", i.e. in going from each vertex to the
        next in L, we only follow valid edges.

A = an array of n bool's    ← to remember which vertices have been visited
initialize elt's of A to false
	
check = true, i = 1, curr = L[0], A[curr] = true
while i < k and check
  next = L[i]
  if (A[next] = true) return false else A[next] = true
  check = (linearSearch(G[curr],0,G[curr].length-1,next) ≠ NOT_FOUND)
  curr = next
  i = i + 1
return check

One analysis for worst-case time in terms of $n$ only goes like this: Creating and initializing the array $A$ takes $O(n)$ time. We loop $O(k)$ times, but since $k \leq n$, we can make that $O(n)$ times. The linear search is $O(n)$ since no vertex has more than $n$ neighbors. So I arrive at $T_w(n) \in O(n^2)$.

Problem 5 - A better analysis of the Problem 4 Solution

Let's instead do our analysis in terms of $n$ and $e$, where $e$ is the number of edges in the graph. Of course we could just use the previous bound (it can't be wrong!) and say $T_w(n,e) \in O(n^2)$. However, we could look at things a bit differently. If you think about it, we can never have the same value of curr twice, which means that we can never do worse than calling linear search exactly once for each vertex. This means the amount of work is the sum of the neighbors list length over all vertices. But this is exactly $e$, the number of edges! Thus, we get $O(n)$ for the intializing $A$ and for the worst-case number of times through the while loop, and we separately get $O(e)$ for the combined cost of all the linear searches. This gives us a bound of $O(n + e)$, which we can't simplify further, because $e$ could be greater or less than $n$. But, $O(n + e)$ is a much more precise bound, because when graphs have few edges (e.g. no node has more than four children) $e$ is much smaller than $n^2$. This is the kind of analysis we'll usually want with graph algorithms.

Note: $O(f(n) + g(n))$ is just a synonym for $O(\max(f(n),g(n)))$. So $O(n + e) = O(\max(n,e))$.