SI335: Analysis & Correctness for Simple Recursive Algorithms

A recursive version of percolateMax

Here is a recursive variant of the percolate max algorithm from the last homework.

Algorithm: percolateMax(A,i,j)
Input : A,i,j - where A is an array of integers, i..j a range in A
Output: z - the maximum element amongst A[i..j]
Extra Condition: The elements of A[i..j] are reordered s.t. z = A[j]

if i < j
  percolateMax(A,i,j-1)
  if A[j] < A[j-1]
    swap(A[j-1],A[j])
return A[j]

Proof of correctness: To prove a recursive algorithm correct, we must (again) do an inductive proof. This can be subtle, because we have induct "on" something. In other words, there needs to be some non-negative integer quantity associated to the input that gets smaller with every recursive call, until we ultimately hit the base case. For percolateMax, we will induct on the number of elements in the range i..j. So, we will prove that the algorithm meets its specification for the base case (i.e. when there is only one element in i..j), and then we will prove that it meets its specification when there are $n$ elements in the range i..j, assuming (by induction) that it meets its specification whenever its called on smaller ranges.

Base case: $n = 1$. When i=j, there is only one element in the range. We return it as the result, which is correct since it is the max. Moreover, we satisfy the extra condition since A[j] is the max, and we didn't change anything in A[i..j].
Inductive case: $n > 1$. When there are $n > 1$ elements in i..j, we make the recursive call:
```
  percolateMax(A,i,j-1)
```
Since $n$, the number of elements in the range, is smaller in the recursive call, we can use induction to justify that the result returned meets percolateMax's specification, which means (by the Extra condition) that the elements of A[i..j-1] are rearranged so that the largest of those elements is A[j-1]. If that is larger than A[j] we swap, and otherwise we make no change. So by the time we are done, A[i..j] has been rearranged so that A[j] is the largest element in A[i..j]. Thus the extra condition is satisfied. Moreover, by returning A[j] we meet the algoirithm's specification.

Note: You might notice that the "Extra condition" I added to the algorithm's specification is essentially the loop invariant you came up with on the homework for the iterative version. This shouldn't be a surprise.

Analyzing Running Time: Suppose I want to analyze the worst-case running time of percolateMax in terms of $n$, the number of elements in i..j, call it $T_w(n)$. Because the algorithm is recursive, I can only define $T_w(n)$ in terms of itself. So I end up with something like \[ T_w(n) \leq T_w(n-1) + O(1) \] I.e. a recurrence relation. Thus, for all $n \gt 1$ there is a suitably large positive constant such that $T_n(1) \leq c$ and \[ T_w(n) \leq T_w(n-1) + c \] Expanding this out gives $T_w(n) \leq c n$, so $T_w(n) \in O(n)$

A recursive version of the uniqueDest algorithm

Recall the uniqueDest algorithm from a previous homework. Here is a recursive version of that algorithm.

Algorithm: uniqueDest(P,n,s)
Inputs: P,n,s --- an input instance of the Unique Destination problem
Output: TRUE/FALSE a solution to the Unique Destination problem

next = count = i = 0
while i < n do     ← this loop counts the number of children of s and sets next to the most recently seen child
  if P[i] == s
    count = count + 1
    next = i
  i = i + 1

if count == 1
   return uniqueDest(P,n,next)      
else
   return count == 0

Proof of correctness: As before, we prove this recursive algorithm correct using an inductive proof. But what to do induction on? We need to find something that gets smaller with each recursive call, but what? $P$ and $n$ stay the same with each recursive call, so not those. How about $s$? It doesn't necessarily get smaller. If I'm looking at $s = 7$ it just means I'm at vertex 7, and the index of the child node could be anything: it could be bigger, it could be smaller. So that doesn't work. So what can we to our induction on?

In fact, this is something I should know, because in a recursive algorithm I need to ensure that I'm "working towards the base case" with every recursive call. Once again, I must be getting close to the base case, so something must be getting smaller. What is it? One natural choice is: the length of the longest path starting from $s$. Let's call that $d_s$. Because we have a forest, if $c$ is a child of $s$, then $d_s \geq 1 + d_c$.

So we'll prove that uniqueDest meets its specification using induction on $d_s$.

Base case: If $d_s = 0$ we return true, which is correct because $s$ is a leaf, meaning that it is its own unique destination.
Inductive case: If $d_s \gt 0$, then the loop sets count to the number of children of $s$, and next to one of $s$'s children. Note that $s$ has children in this case, since $d_s \gt 0$. If count $\gt 1$, the algorithm returns false, which is correct since $s$ has multiple children and thus no unique destination. Otherwise, next is the unique child, and $s$ has a unique destination if and only if that unique child next has a unique destination. Since $d_{\text{next}} \lt d_s$, we can say by induction that the call to uniqueDest(P,n,next) meets the specification, and thus by returning that result we get the right answer.

Analyzing Running Time: Here we might mistakenly write down the recurrence relation \[ T_w(n) \leq T_w(n-1) + O(n) \] to express the worst case running time, but this is in fact wrong! Why? Because in the recursive calls $n$ doesn't get smaller. If we use $d$ to represent the length of the longest path from $s$, then the proper recurrence relation is \[ T_w(d) \leq T_w(d-1) + O(n) \] for any $d \gt 0$. Thus, for some suitably large constant $c$ we get \[ T_w(d) \leq T_w(d-1) + c n \] for any $d \gt 0$. Now if you expand this, we get \[ T_w(d) \leq c d n. \] When we see it written out this way, we realize that the worst-case running time is really a function of two parameters: $d$ and $n$. So we really should be writing: \[ T_w(n,d) \in O(d n) \] We could leave it at that, and we would have a correct answer. On the other hand, we often want to get our running time in terms solely of the size of the input ($n$ in this case) and not in terms of other properties. In this case, we not that $d \lt n$, so we get that $T_w(n)$, the worst-case running time in terms of $n$ is bounded from above by $c n^2$, and thus is $O(n^2)$.

Reading

Homework

A recursive version of percolateMax

A recursive version of the uniqueDest algorithm