ceil(sqrt(i))" is the
first integer greater than or equal to sqrt(i). So for
example, ceil(sqrt(7)) = 3, and
ceil(sqrt(16)) = 4. Your homework is to improve
on my algorithm, and make a graph showing the running times in
microseconds of my algorithm and your algorith for values of n
from 1 to 15. You must answer this question: did you "tune"
the algorithm, i.e. improve the running time but not change
the growth rate, or did you really "improve" the algorithm,
i.e. change the growth rate? Make sure your improved
algorithm still gets the right answers!
Turn In: A print out of your source code, a printout of your plot (use gnuplot, or Excel, or whatever you want), and the answer (justified) to "The question"!
Stop_Watch gives you is the time spend
executing your code, which means that your timing
results shouldn't depend (too much) on how many other people
are using the same computer as you or what they're doing.
This stop watch has a granularity to it as well,
which you need to be conscious of. Imagine a digital stop
watch with a minutes and seconds read-out. If you tried to
time something that took less than a second, it would read
that 0 time had passed. This doesn't mean that the event
literally took 0 time, but that the stop watch couldn't
measure events to an accuracy under 1 second. Such a stop
watch would have a granularity of 1 second. The Stop_Watch
class's granularity is machine dependant, but it's probably
not capable of measuring events under 1 millisecond
... probably closer to 10 milliseconds, in fact. In this
case, you have to run the code you want to time over and
over again - say 1000 times - and time the duration of the
1000-run trial. The time per run would be the total divided
by 1000.
x8 x8 x8 x8
f(x,k) = --- + --- + ... + ----- + ---
1! 2! (k-1)! k!
We're going to be interested in comparing them based on
speed - essentially asking which algorithm is fastest.
This sounds clear, but isn't really as clear as you think!
I have implemented two algorithms for this problem: Alg1 and Alg2. Experimentally I've determined that to compute f(3.1,8) takes 1.32 microseconds with Alg1 and 0.23 microseconds with Alg2. On the other hand, computing f(0.75,12) takes Alg1 1.81 microseconds and takes Alg2 8.02 microseconds. Which is better? Pretty hard to say, isn't it? When we talk about running time of algorithms, which input are we talking about? Different input takes different amounts of time!
Let's shift the focus and look at one specific algorithm
and try to run some experiments with it. The function
f_alg0 below
implements a pretty straightforward algorithm for
computing f:
double f_alg0(double x, int k)
{
double S = 0.0;
for(int n = 1; n <= k; n++)
{
// Compute x^8
double x8 = 1.0;
for(int i = 0; i < 8; i++)
x8 = x8*x;
// Compute n!
double f = 1.0;
for(int i = 1; i <= n; i++)
f = f*i;
// Add next term
S += x8/f;
}
return S;
}
If you run some experiments on this, you'll find that the
time required does not depend on x at all, only on
k. (this program
allows you to run some time trials). This tells us
important thing number 1: Any single arithmetic operation
on the computer takes the same amount of time regardless
of the specific values of the operands. In other
words, it takes just as long to compute 13.4*5.2 as
30003.8798798*0.00003782. What matters in this case is
k, because k affects how many times we go
through our main loop - the bigger k is, the longer
the computation takes. Thus, no one number gives us "the
running time" of f_alg0. Instead, "the running
time" is a function of k - let's call it
T0(k). Below you see a plot of this
function for values of k from 1 to 20.
| T0(k) in (microseconds) |
![]() |
| Alg1 We notice that x^8 is getting recomputed needlessly every time through the outer loop, so we compute once, before the loop. | Alg2 We notice that each time through the loop the new value of the variable f is the old value of f times n, so we simply remember the old value and reuse it rather than recompute it. | Alg3 We compute x^8 before the main loop, compute it with only three multiplications (rather than an 8-iteration) loop, and we factor the x^8 out of the sum. | Alg4 We use all the best features of the previous three!. |
double f_alg1(double x, int k)
{
// Compute x^8
double x8 = 1.0;
for(int i = 0; i < 8; i++)
x8 = x8*x;
double S = 0.0;
for(int n = 1; n <= k; n++)
{
// Compute n!
double f = 1.0;
for(int i = 1; i <= n; i++)
f = f*i;
// Add next term
S += x8/f;
}
return S;
}
|
double f_alg2(double x, int k)
{
double S = 0.0, f = 1.0;
for(int n = 1; n <= k; n++)
{
// Compute x^8
double x8 = 1.0;
for(int i = 0; i < 8; i++)
x8 = x8*x;
// Compute n!
f = f*n;
// Add next term
S += x8/f;
}
return S;
}
|
double f_alg3(double x, int k)
{
// Compute x^8
double x2 = x*x;
double x4 = x2*x2;
double x8 = x4*x4;
double S = 0.0, f = 1.0;
for(int n = 1; n <= k; n++)
{
// Compute n!
double f = 1.0;
for(int i = 1; i <= n; i++)
f = f*i;
// Add next term
S += 1/f;
}
return x8*S;
}
|
double f_alg4(double x, int k)
{
// Compute x^8
double x2 = x*x;
double x4 = x2*x2;
double x8 = x4*x4;
double S = 0.0, f = 1.0;
for(int n = 1; n <= k; n++)
{
// Compute n!
f = f*n;
// Add next term
S += 1/f;
}
return x8*S;
}
|
So, which algorithm is best? Hopefully you'd say Alg4. But how much better? How would you rank them? The key thing to notice, is that Alg0, Alg1 and Alg3 all curve up, whereas Alg2 and Alg4 are straight lines. To really see how important this distinction is, let's plot the computing times (in milliseconds now) over a much wider range of k values - from 100 up to 1000.

What we're really interested in when we compare algorithms is the shape of the running time function's curve. That tells us how fast the running time grows as the input gets bigger and bigger. The difference between Alg0 and Alg3 for example, is irrelevent when you compare them to Alg2. Alg0 and Alg3's running times differ by a constant multiple - roughly T0(k) = 3/2 T3(k) would be my guess from the earlier graph. That's nice, but they both look equally slow on the last graph when we compare them to Alg2. Similarly, Alg4's running time differs from Alg2 by a constant multiple - roughly T2(k) = 4 T4(k) would be my guess from the earlier graph. That's nice, but that difference is irrelevent when you compare those two to all the other algorithms. Constant multiple differences in running time don't change the shape of the running time function graphs, they don't change the growth rate of the running time functions. So, important thing number 3: what we really care about is the growth rate of an algorithm's run-time function. This is justified by the above discussion comparing several algorithms solving our f(x,k) problem. It's got other justifications as well. The actual time in seconds for an implementation of an algorithm depends on the skill of the programmer, depends on whether or not you're using compiler optimization, depends on the language used, depends on the architecture of the computer ... depends on all sorts of things. The growth rate is independent of these things. All these things change the running time function by a constant multiple, they don't change the growth rate, they don't change the fundamental shape of the curve.
(n)",
so we'd write that "T2(k) =
(n)" and
"T4(k) =
(n)".
On the other hand, the other algorithms are "
(n2)",
which is to say they grow like n2.
Clearly, n2 grows faster than
n, so we prefer the
(n).
algorithm. There's a whole hierarchy of growth rates,
including:
2^n <------ Grows fast! n^3 n^2 n log(n) n sqrt(n) log(n) <---- Grows slow!This link link demonstrates how each elment of the hierarchy might as well be the constant zero compared to everything above it - just like Alg2 and Alg4 took no time at all compared to the others when you started considering big enough values of k.
Ultimately what we'd like to do is look at an algorithm and place it on the growth rate hierarchy without even having to implement it! That way we can compare all the different algorithms we dream up for a problem, pick the best, and only bother to sit down and program the best
int find(vector<int> &A, int x)
{
int i = 0;
while(i < A.size() && A[i] != x)
i++;
return i;
}
We'd like the running time as a function of NA.
Well, now for a given input size NA
there's a whole range of possible running times from almost
no time at all (if x is sitting in the first cell
of A) to however long it takes to look through the
whole vecotor. Here's the plot of all the possible running
times as NA varies from 10 to 100 in
increments of 5, and over all possible x values
(remember we're assuming x is in the vector
somewhere).
(N_A),
and Tbest(N_A) =
(1),
i.e. it grows like the constant 1, which is to say it
doesn't grow at all, and
Tave(N_A) =
(N_A).
Typically, we'll talk about the best, worst and average case
running times of algorithms as functions of their input size.