SI335: Improving simple iterative algorithms based on analysis

Using analysis to improve algorithm uniqueDest

Consider our analysis of the algorithm uniqueDest. If we want to improve its worst-case running time, what is going to need to be done better? Hopefully you see that the only way to make it faster is to replace the inner loop with something that provides the same information (i.e. whether "curr" has 0, 1 or multiple children and if only one, that child's index) but does it faster. There's really no other option. In class we saw how we could do that with a bit of "preprocessing", but at the cost of using extra memory. This is a common idea: we can reduce the time required, but at the cost of increasing the memory required. In this case, it's a good trade!

Question: How does the best-case time for this "improved" algorithm compare to what we had for quiz problem 4?

Revisiting algorithm findFirstMissing

Let's revisit the "findFirstMissing" algorithm from quiz problem Q2, which we discussed in detail in Class 4.

Inportant note: for this discussion we will assume that duplicates are not allowed in B!

Here's an observation that will allow us to make the algorithm faster. If we're at the while loop test condition, and k ≠ NOTFOUND, Then we know we found B[j] and position k in A. Therefore, there's no reason to look for B[j+1], which is bigger than B[j], in A[0],A[1],...,A[k]. This allows us to do our next search from indices $k+1$ to $n-1$, rather than from $0$ to $n-1$. That's got to be an improvement, right?

So, first let's assume the improved findFirstMissing uses binary search. Is the algorithm's worst case running time better? This is an interesting question. First of all, it will undoubtly run faster in the worst case (indeed, in almost all cases!) than the original algorithm. On the other hand, it's worst-case running time is still $\Theta(n \lg n)$, so the asymptotic growth rate is the same. What does this mean?!?!?!? Remember, the actual worst-case running time functions both look like a constant times $\lg n$ plus some slower growing terms. Its just that with our asymptotic analysis, we don't know what those constants are. This "better" version of findFirstMissing with binary search is better in the sense that the mysterious constant (which we don't know) is undoubtedly smaller than it was before. However, it's not better in terms of asymptotic growth rate. Generally, when we reduce the hidden constant, we say we've "tweaked" the algorithm, but we only say we've "improved" it when the growth rate is smaller.

So now let's see whether the new findFirstMissing with linear search is an improvement. When we do the analysis we get ... a gap between our upper and lower bounds on the worst-case running time. So we don't really know what's going on. It might be a real "improvement", or it might simply be "tweaking" and our analysis is just not good enough to know which is the case!

Other measures of difficulty

In analyzing linear search, we took the number of elements in the range we're searching, i.e. "n", to be the measure of the problem's "size" or "difficulty". It's fair to consider other measures of size/difficulty as well, and see if analysis with regards to them is more enlightening. We called the number of elements in the range $n$. Let's use $m$ to denote the number of elements from the start of the range to the first occurence of something greater than or equal to $x$ (or the end of the range). In this case, the worst and best case running times for linearSearch would be $\Theta(m)$.

With this new view of linearSearch's running time, let's revisit our analysis of the worst case running time for the new findFirstMissing. Instead of accounting for the while loop by multiplying the number of iterations times an upper (or lower) bound on the cost of a executing the loop body, let's "unroll" the loop and sum up the costs of each and every iteration. The key observation is that the cost of the linearSearch in the "$t$th" iteration through the while-loop is $\Theta(m)$, where $m$ is from the above. This means there is some constant $c$ such that $cm$ is an upper bound on the worst case running time. Let's use $m_t$ to denote that $m$-value for the $t$th iteration. Then an upper bound on all the loop iterations (let's say there are $r$ of them) is: \[ c m_1 + c m_2 + \cdots + c m_k = c(m_1 + m_2 + \cdots + m_r) \] But since we start the search after where the previous search ended, the sum $m_1 + m_2 + \cdots + m_r$ must be at most $n$, the size of the array $A$. So $c n$ is an upper bound on the cost of all the loop iterations together, which means the algorithm is $O(n)$. Since we already knew the worst case was $\Omega(n)$ we see that the worst case running time is $\Theta(n)$. This is pretty cool: the improved findFirstMissing is better with linearSearch than with binarySearch, even though binary search is better than linearSearch! That's why we do analysis, to discover such things!

Reading

Quiz questions

Review quiz questions

Using analysis to improve algorithm uniqueDest

Revisiting algorithm findFirstMissing

Other measures of difficulty