Reading

Sections 9.1-9.2 of Problem Solving with C++.

Pointers & Arrays

Last lesson we worked a little bit with accessing characters within an object of type string by index. This gave us some practice with indexed collections of objects, but the type string, remember, is not a basic built-in type in C++. It is a type that is defined in the string library, as we will eventually learn to define new types in our programs. So, to learn about arrays in C++, we will have to return to the basic built-in types and understand the concepts that are "beneath the hood" in the string type.

All the basic types in C++ are fixed size objects - a char is 8 bits, an int is 32 bits, etc. Arrays don't fit that mold, at least not on the surface. With arrays, the amount of storage we'll need depends on the number of objects we want to store in the array - as I said, a different model. So in C++, you get arrays of 10 ints, for example, by asking for 10 consecutive int objects in memory and having a variable that points to the first of these. From this pointer variable you access a particular one of the 10 ints by index.

A variable is declared as a pointer to a particular type by putting a * in front of the variable name. For example, to declare a variable p that is a pointer to objects of type int, you'd write:

int *p;
Now, you'll want to think of int* as being a new type, which is really what it is, but the declaration syntax is a little weird. For example,
int* p, q;
declares p to be of type int* and q to be of type int. So the int applies to all the variables, but the * part only applies to the next variable. If you wanted both p and q to be "pointers to ints", you'd have to write
int *p, *q;
which is unfortunate, but which is syntax C++ inherited from C. In any event, having declared p as a pointer to objects of type int, we have to actually allocate some ints in memory for it to point to! Just declaring p leaves it uninitialized - i.e. pointing to nothing, or what's worse, pointing to some random area of memory. You allocate memory with the new operator, like this:
new type[number]
This expression allocates number consecutive objects of type type in memory, and returns a pointer to the first object. Thus, to allocate 10 ints in memory and set p to point to the first of them, we'd write:
p = new int[10];
At this point, we can access any of these int objects by the indices 0,1,2,3,4,5,6,7,8,9 just as we did for the characters within a string. To get the last int in the array, we write p[9], and since p[9] is an int, we can do anything with it we can do with a normal int. To make this clear, the abstract concept of an array is simply a collection of n objects, all of the same type, that can be accessed by index. In C++, pointer variables and the new operator are what we use for arrays - the array is really a contiguous block of objects in memory, and the pointer variable allows us to access the objects in this block by index. The following picture depicts what's going on when we create an array of 4 ints and assign a value to one of the ints in the array.

DECLARE VARIABLES
ALLOCATE SPACE
ASSIGN A VALUE

Note: the pointer is a variable with name and scope like we're used to. In the above example, the pointer p is a local variable of main, and is part of that function call record on the stack. However, the array that pointer points to is in a different area of the program's memory called "the heap". The array itself is not part of any function call record on the stack. That's what this picture is showing you.

A Simple Problem

Let's look at a simple problem that we can solve using arrys: Write a program that reads in a number n from the user, and then reads in n doubles from the user, and simply prints them out in reverse order. We already saw from our excercises with accessing characters within a string that indexed collections of objects would allow us to print objects in reverse order.
int main()
{
  // Get number of doubles from user
  int n;
  cout << "How many doubles? ";
  cin >> n;

  // Construct array
  double *A;
  A = new double[n];

  // Read numbers
  cout << "Enter " << n << " doubles: ";
  for(int i = 0; i < n; i++)
    cin >> A[i];

  // Write in reverse order
  cout << "In reverse your numbers are: ";
  for(int j = n - 1; j >= 0; j--)
    cout << A[j] << " ";
  cout << endl;

  return 0;
}
Even an example as simple as this deserves a fair bit of commentary the first time around. In class we'll go over it in painful detail, including pictures.

The variable is just a pointer!

Remember, the variable that you're manipulating by name in your program is just a pointer, not the actual collection of object. If you keep this in mind, some things that might otherwise trip you up make perfect sense. For example, what happens when I assign one pointer to anther? Well, they simply point to the same thing! A picture should make this clear.


What's hopefully clear here is that by creating a new pointer q and setting q = p, we did not create a new array populated with the same integer values. We just created a new pointer and set it to point to the same old array.

Objects allocated with new have no scope and live until you explicitly kill them

One of the funky things about objects allocated with new is that they have no scope. So if, for example, you wrote a function that called new to allocate some space, the objects in memory created by new would not be destroyed when the function returned - even though the pointer we had goes out of scope. Take this example:

Before the function call
 
During the function call
 
After the function call

You can see that after the function call, our array is still there - although it's utterly useless, since we no longer have a pointer with which to access the array. Instead of being destroyed automatically by scoping rules, objects allocated with new either stick around until the end of the program, or are explicitly destroyed with the delete operator. For example:

char *p;
p = new char[30]; // creates 30 consecutive char's in memory
delete [] p; // destroys the objects created by new
We'll worry more about delete later.

A note on types

We now have a new type: "pointer to X", where X is a type. So, for example, with the declaration
bool *B = new bool[10];
the type of B is bool*, and the type of B[0] is bool. If we have the declaration
string *A = new string[20];
the tyope of A is string*, and the type of A[0] is string.

It's also important to note that, as we saw with strings, a subscripted array expression like A[5] is an lvalue, i.e. it is allowed on the left-hand side of an assignment. This is tremendously important. This means that A[5] is not a copy of the index-5 element of the array, it is the actual index-5 element. You can modify it. So,

A[5] = "hello"
is perfectly valid. If C has type int*, C[i]++ is perfectly valid — it means increment the index-i element of the array C. You might worry about how an expression like C[i]++ should be evaluated. If you look at the operator precedence table (under resources) you'll see that [.] has very high precedence, so C[i]++ evaluates as (C[i])++.

Problems

  1. Write a program to compute dot-products of vectors. If v = [a1,a2,...,am] and w = [b1,b2,...,bm] are two vectors of dimension m, the dot product of v and w is a1*b1 + a2*b2 + ... + am*bm. Your program will get a dimension m from the user, read in two vectors of length m, and print out their dot product. Here's a sample run:
    Enter dimension: 4
    Enter two vectors: 
    [3,-1,0,2]
    [5,0,9,-7]
    Dot product = 1
    Try it for yourself and then take a look at my solution.

  2. Write a program that reads a list of banned words from a file, stores them in an array, and then simply reads words from the user and returns "banned" or "not banned" until the word "end" is encountered. The file starts with a number, which is the number of banned words, and then the words themselves are listed. The file banned.txt is a good example. Here's my solution.