First of all, why would an extensible array be interesting? Here's a simple example: Write a program that reads in a list of numbers (terminated with a ";") from the user and prints them in sorted order. You'd like to store the numbers in an array to sort, but you won't know how big the array needs to be until the user stops entering numbers. But at that point its too late, since all the numbers you wanted to store are gone. An extensible array would keep growing as we added more and more numbers.
Our extensible array will work like the STL vector. You have two operations, one is element access using [], and the other is push_back to add a new element at the end of the array. Like the STL vector, we'll provide a size() function to allow the user to to determine how many elements are in the array.
class eArray { int *A; int n; public: eArray() { A = new int[1]; n = 0; } int& operator[](int k) { return A[k]; } int size() { return n; } void push_back(int x) { int *Ap = A; A = new int[n+1]; for(int i = 0; i < n; i++) A[i] = Ap[i]; delete [] Ap; A[n++] = x; } };It's pretty clear that a single "push_back" could take Θ(n) time. How about a sequence of k operations on an eArray, what's the worst case they could take?
Well, the best case is when no operation is a push_back, and the worst case is when every operation is a push_back. Supposing we start with n=0 and perform k push_back's. We'll have to copy arrays of size 1,2,...,k. So we'll do
Θ(1) + Θ(2) + Θ(3) + ... + Θ(k) = Θ(k^2)steps. We'd say then that in the worst case, the average cost per operation in a a sequence of k operations (the amortized cost) is Θ(k). So, for example, consider the following stupid program, which constructs an eArray of int's from 0 up to n, and then computes their sum.
int main() { int n; cin >> n; eArray X; for(int i = 0; i <= n; i++) X.push_back(i); int sum = 0; for(int j = n; j >= 0; j--) sum += X[j]; cout << sum << endl; }This algorithm takes Θ(n^2) time!
class eArray { int n,c; int *A; public: eArray() { n = 0; c = 1; A = new int[1]; } int& operator[](int i) { return A[i]; } void push_back(int x) { if (n == c) { c = 2*c; int *B = new int[c]; for(int i = 0; i < n; ++i) B[i] = A[i]; delete [] A; A = B; } A[n] = x; ++n; } };Now lets try an amortized analysis again. We'll determine the worst case time requirements for a sequence of k operations, and then divide that by k to get the amortized cost per operation. Once again, suppose the array starts off empty. The worst case is still that all k operations are push_back's. The array will ultimately have to be at least k elements long, but won't be more than 2*k elements long. Let d be the ceiling of the log base 2 of k. So the first time it gets resized it gets resized to length 1, then 2, then 4, then 8, ..., up to 2^d. If we resize to length r, then we do Θ(r) work, since we just copy. Thus, we end up doing Θ of
2^0 + 2^1 + 2^2 + 2^3 + .... + 2^d = 2^(d+1) - 1 = Θ(k)work in copying. Of course we do Θ(k) work in assigning values (with or without copying), but the total work is still Θ(k). Thus, the amortized cost is Θ(1), just like assigning to an array without worrying about resizing!
Check out Notes[3] in SGI STL vector documentation and you should recognize that the STL vector is doing just what we're talking about!
Let n be the number of elements in the Extensible Array, and let s be the capacity of the array. At all times, n ≤ s (of course) and our doubling scheme ensures that n ≥ s/2, i.e. that we're always using at least half the allocated memory Operation Actual Cost Amortized Cost ----------------------------------------------- push_back 1 CU if n < s 3 CU's in either case n + 1 CU's if n = s We want to prove that the "balance" B in our computational account is always greater than or equal to zero, since this means out amortized payments always suffice to cover our actual costs. We will do this by proving that B ≥ 2*n - s after every operation, where B is the balance, n is the number of elt's, and s is the capacity of the allocated memory. 1) After the first operation we've paid 3 CUs, been charged 1 CU, B = 2, n = 1, and s = 1. Thus, B ≥ 2*n - s 2) Assume B ≥ 2*n - s and we perform an operation resulting in new balance B', new size n', and new capacity s'. We'll show that regardless of which case we're in B' ≥ 2*n' - s', i.e. that we always have enough of a balance to cover our computation expenses. CASE 1: n < s ------------- B ≥ 2*n - s B + 3 - 1 ≥ 2*n - s + 3 - 1 B' ≥ 2*n - s + 2 B' ≥ 2*n + 2 - s B' ≥ 2*(n + 1) - s B' ≥ 2*n' - s' CASE 2: n = s ------------- B ≥ 2*n - s B + 3 - (n + 1) ≥ 2*n - s + 3 - (n + 1) B' ≥ 2*n - s + 2 - n B' ≥ 2*n + 2 - s - n B' ≥ 2*(n + 1) - s - n B' ≥ 2*(n + 1) - s - s B' ≥ 2*n' - s'This shows that the amortized cost of a push_back in our extensible array is constant.
You might be tempted to say that whenever my array becomes less than half full, I should shrink the capacity by half. It isn't too hard to see that this could lead to every other operation being an expensive resizing operation, so this approach doesn't work too well.
The approach we'll take is to ensure that the capacity c and n, the number of elements in the array, always satisfies c/4 < n <= c. That way, we'll never be using more than 4 times the required memory.
Keeping the same amortized cost per operation as before (i.e. now both push_back and remove_back have amortized cost of 4 CU's) we'll show that: If n is the number of elements in the array, c is the capacity of the array, and B is our "balance" (the number of CU's to our credit), we'll show that B ≥ n + 2|n + c/2|. If we can prove that always holds, it'll mean that we never have a negative bank balance, i.e. we can always pay for our computation!
We already proved this for the push_back operation, so it only remains to prove this for the remove_back operation.
B = balance before operation, B' = balance after operation n = # of elt's in array before, n' = # of elt's in array after c = capacity of array before c' = capacity of array after CASE 1: No resize, c = c', (actual cost is 1) B' = B + 4 - 1 ≥ n + 2|n - c/2| + 3 ≥ (n-1) + 2|(n-1) - c/2| = n' + 2| n' - c'/2| CASE 2: Resize required, i.e. n = c/4 + 1 & n' = c'/2 (actual cost is n-1) B' = B + 4 - (n-1) ≥ n + 2|n - c/2| + 5 - n = n + 2|n - n/2| + 5 - n = n + 2|n/2| + 5 - n = n + n + 5 - n = n + 5 ≥ (n-1) = (n-1) + 2| 0 | = (n-1) + 2|n' - c'/2|