Arrays and Strings in C

To start with, go back and remind yourself of the workings of printf and scanf from our previous lab.

malloc, calloc and free

Like our last C lab, we're going to learn how to do the things we can already do in C++ in C - and like last week, you'll discover that the differences are largely superficial (this isn't true for C and C++ in general, just for the subset of C++ we're learning in this class). First we'll learn about arrays and memory allocation.

In C++, memory is allocated with the keyword new. In C, we instead use the keywords malloc or calloc. They're almost, but not exactly, the same.

The prototype for malloc is void* malloc(int);. The int is the number of bytes being requested of the operating system. So, when requesting space for an array of 5 ints, for example, to arrive at the correct number of bytes to request, you need to know not only the 5, but also the number of bytes required by a single int. To do this, we use the sizeof() function. For example:

malloc( 5*sizeof(int) );

calls sizeof(int), which returns the number of bytes needed to store a single int. This is then multiplied by 5 (because you need room for 5 of them), the result of which is finally the argument to malloc.

Because malloc just accepts a number, it has no knowledge of the type you're going to store within that space, so it doesn't know whether to return a int* or a char* or what. So, it just returns a void*, which can be cast to any of those types. So, for our array of 5 ints, we finally end up at:

int* intArr = malloc( 5*sizeof(int) );

For 12 doubles, it would look like this:

double* doubles = malloc( 12*sizeof(double) );

Our other option for memory allocation is calloc, which makes this "I need to know two things, the size of the type and the number of elements I'm story" thing explicit. In calloc, our "array of 5 ints" and "array of 12 doubles" looks like this:

int* intArr = calloc( 5, sizeof(int) );
double* doubles = calloc( 12, sizeof(double) );

You'll notice that it takes two arguments, while malloc just requires you to multiply them together yourself. The other significant difference is that unlike new or malloc, calloc not only allocates the memory, it also zeroes it out. No random garbage in your arrays! But, it's somewhat slower, since it has to iterate over all the memory and zero it out.

The choice between malloc and calloc is largely a personal preference between the two, as long as you're knowledgeable about these differences.

When using malloc or calloc, be sure to #include <stdlib.h>, or you'll get a warning about the return type of your allocation function.

The C equivalent of delete is the function free:

free(intArr);
free(doubles);

strings

First, to use strings in C, make sure you #include <string.h>.

Strings in C are a much more basic thing - they are explicitly, and only, an array of chars, following by a null character (0 on your ASCII table, first in your heart). So, if you were to encode the word "hi" as a C string, you would need an array of size three:

char* hiStr = malloc( 3*sizeof(char) );
hiStr[0] = 'h';
hiStr[1] = 'i';
hiStr[2] = 0; //alternatively, the character '\0'

Alternatively, you could do this:

char* hiStr = "hi";

This second case builds the array in the stack, rather than the heap - this means that this second example does not need to be freed, while the first does. In general, we're avoiding this in our class (it's called "static allocation"), but for C strings it's too common to step around. You'll also see:

char aString[20];

which builds an array of 20 chars in the stack, pointed to by a char* aString. Again, you can technically build arrays in the stack like this for any type but it's exceedingly common with strings, which tend to be short, and so people put them in the stack, so they don't have to free them.

Once you've allocated space for some chars, your pointer can be used with scanf to read in strings from the user:

char aString[20];
scanf("%s",aString);

or

char* aString = malloc( 20*sizeof(char) );
scanf("%s",aString);

As with cin, the string will be read in until whitespace - here, you have to be careful to have allocated enough space for the string you'll be reading in!

The string.h library provides a wide variety of useful string functions.

Reading from files

Again, very superficial differences. First, you make a pointer to a FILE using fopen, which accepts two C strings as arguments, the first of which is a filename, and the second is a mode (detailed here). After doing this, you can use fscanf or fprintf just like you would scanf and printf. So, to open a file and read an integer:

FILE* fin = fopen("someFile.txt","r");
int theInt;
int* theIntPtr = &theInt;
fscanf(fin, "%d", theIntPtr);

The assignment

In this assignment, we'll be working with ISBN (International Standard Book Number) numbers, a special code printed on the books of most countries.

The ISBN-10 code (there's also an ISBN-13) consists of 9 digits followed by a tenth check symbol. This check symbol serves to detect errors if some of the 9 digits are improper. The algorithm relating the digits to the check symbol begins by summing the first digit plus 2 times the second digit plus 3 times the third, and ultimately 9 times the ninth digit. Then compute the remainder of this sum divided by 11. If this remainder is 10 then the check symbol is "X" (or "x"), otherwise the check symbol is the remainder.

This file contains two files, ISBNOnly.txt and BX-Books-Treated.csv. Look at BX-Books-Treated.csv first. This file contains the ISBNs, Title, Authors, Publishing year, and Publishing Company for 270,876 books, separated by semicolons. ISBNOnly.txt contains the same, but only the ISBNs.

Some of these ISBNs are wrong. I'd like to know which ones.

For part 1 (85/100), write a program called part1.c which reads in from ISBNOnly.txt and outputs the incorrect ISBNs within the file. You may use the fact that we know there are 270,876 lines in the file. As part of this, you should create two functions with the following prototypes:

int* strToInt(char*);
int valid(char*);

strToInt takes in a 10-character string (an ISBN), and turns it into an array of integers. valid takes in a 10-character string (an ISBN) and returns 1 if it is a valid ISBN and 0 otherwise (you'll likely want to use strToInt within this function).

There are 7 incorrect ISBNs.

For part 2, do the same in a file called part2.c, but using BX-Books-Treated.csv. This should output the ISBN, the title, and the author of the books with incorrect ISBNs. For this, you'll likely want to check out the available string functions in string.h to do things like concatenation and checking equality (as + and == don't work for arrays).