Analyzing a simple iterative algorithm: findFirstMissing
with binarySearch
Dr. Roche, in the Unit 1 notes, kindly analyzes our three search
functions so painstakingly that he gives us exact expressions
for the worst-case running time in terms of number of statements
exectued. From those functions, we can deduce $O$, $\Theta$ and
$\Omega$ expressions. However, this is most
decidedly
not the way we normally do things! If we
know the exact expressions, there's not much reason to bother
with $O$, $\Theta$ and $\Omega$. The usual case is that it's
too difficult to get exact expressions, and we "settle" for
getting information about asymptotic growth rates. So let's try
analyzing findFirstMissing the usual way: we'll get what info we
can about the worst-case running time function's growth rate,
which will come in terms of $O$/$\Theta$/$\Omega$, without ever
trying to get an exact expression for the function.
In class we determined that the worst-case running time
function $T_w(n)$ is both $O(n \lg n)$ and $\Omega(n \lg n)$.
I.e., it is $\Theta(n \lg n)$.
What happens when we replace binarySearch with linearSearch?
Let's replace binarySearch with linearSearch in findFirstMissing.
The interesting thing here, is that we can't just say the call
to linearSearch takes time $\Theta(n)$. We would have to prove first
that it's actually possible in the context of the
findFirstMissing algorithm for every call to linearSearch to
be "worst case" for linearSearch, despite the fact that each
call is a different $B[i]$. In fact, if we don't
allow duplicate entries in array $B$, it is not
possible for more than two calls to linearSearch to take $n$ steps.
If you look at an intermediate iteration of findFirstMissing's
while loop (i.e. not the last), the time taken for
search(A,0,n-1,B[j]) is strictly greater than the time taken
search(A,0,n-1,B[t]) for any t < j. So we can't assume that
search takes $\Omega(n)$ for every iteration. So what do we do?
The first thing we did was to punt, and put the call to search
as taking $\Omega(1)$ time, which is correct but not very
precise. That gave us
$T_w(n) \in \Omega(n)$
as well as
$T_w(n) \in O(n^2)$.
The second thing we did was to analyze a very specific input
situation, get a lower bound for that case, and note that the
"worst case" had to be at least that bad, so the lower bound
was also a lower bound on the worst case. I suggested the
input situation in which the first half of B is identical to
the second half of A, and the index n/2 element of B is not in
A. In this case, we go through the while-loop $\Omega(n/2)$ times, and
the call to linear search takes $\Omega(n/2)$ time, which
gives us $\Omega(n^2/4) = \Omega(n^2)$ as a lower bound.
Since $T_n(n)$ is $O(n^2)$ and $\Omega(n^2)$, it is $\Theta(n^2)$.