Class 7: Birth & death of a C program; exit

Reading: Section 7.1-7.3 of APUE.
Homework: Printout the Homework and answer the questions on that paper.

C I/O

Displaying data on a monitor, reading data from a keyboard, storing/retrieving data from a disk or a network connection --- all these things ultimately involve system calls, they manipulate system resources and, as we know, only the kernel gets to do that. The basic functions for I/O we've used so far are fprintf and fscanf. These are not system calls! These are C standard library calls. Clearly, the implementations of these function calls must make system calls to get the I/O done, but the functions themselves are regular ol' C library functions. We need to understand a bit about how C library functions are implemented. In particular, we need to understand I/O buffering.

C I/O Buffering

Storing and retrieving data from a harddrive is lots slower than storing and retrieving data from memory. I mean lots. Moreover, writing a single byte takes about the same amount of time as writing a big chunk of bytes called a block. The C standard library I/O routines use a technique called buffering to deal with this. The idea is simple:

Output buffering: Every time your program writes some characters, instead of writing the characters to the hard drive, add them to some buffer, i.e. some array of characters. Only when the buffer is full do you actually write anying to the disk, and then you write the whole buffer at once.
Input buffering: When a program tries to read a character from a file, instead of just fetching the next character from the file read in the next block of characters from the file and place them in a buffer. A normal read operation just remove characters from the buffer, and only if and when the buffer gets emptied does the C I/O library have to go fetch data from the disk.

Insert analogy here for why this makes I/O faster! Figure 3.5 in the book shows how dramatic an effect buffering can have. Remember, this buffering is done by the C standard library routines, not by the kernel!

I/O associated with a given file stream (FILE*) may be unbuffered, line buffered, or fully buffered. The kind of buffering you get depends on how the program is actually called. For example, if foo is called like this

./foo < temp

its input will be fully buffered, but its output will be line buffered. Within a program, you can change the buffering mode with setbuf and setvbuf.

     void setbuf(FILE *stream, char *buf);

     int setvbuf(FILE *stream, char *buf, int type, size_t size);

By default, input and output are line buffered. If you want to "flush" an output buffer, i.e. have everything stored in it actually sent to its destination, you can call fflush (try

man
	  fflush

). In a line buffering context, of course, you can just write a newline!

Why exit handlers?

As you start to write programs that actually aquire resources from the operating system, it becomes important free those resources before the program actually terminates. For stdio objects, this happens automatically, but for other objects it's important that we get a chance to clean up before the process terminates. What kind of "clean up" might the stdio library have to do? Well, we need to "flush buffers", i.e. we need to write out everything stored in the output buffers. Otherwise, there will be output in C standard I/O buffers that never makes its way to its destination.

The Life of a C Program

Figure 7.2 on p.183 of APUE gives a nice diagram depicting what happens when a C program starts and when it ends. We went over it in class. Note: this diagram is a nice example of what's called a finite state machine.

The birth of a C program's process

A compiled C program is a file, this we know. There is a system call exec, about which we will have much to say in the coming weeks, whose job it is to start processes running. So a C program's process (in fact any process) is born when another process calls exec. When the process starts, a C start-up routine (which you as a regular ol' programmer do not write) is executed. It is responsible for some initialization and then for calling main, which of course is where you as a programmer start.

The death of a C program's process

A process can terminate voluntarily (commit suicide?) or it can be killed by some sort of signal (as happens when you use ctrl-c on the command line). This discussion is just about what happens when a process terminates voluntarily. Within a C program, you can terminate the program via

a return from main,
the C standard library function exit (try man -s3C exit for info), or
the system calls _exit and _Exit.

The different ways to terminate a process have different implications for exit handlers, the facility that allows for automatic "cleanup" when processes termninate.

Exit handlers

Because a C program may exit in several ways --- and in particular because it may exit without a "return" from main() --- the C language provides a mechanism by which we can be sure that some actions are taken before program exits ... regardless of whether the program exits by returning from main or by calling exit.

What you do is provide a function that the C's runtime system will call before actually exiting from exit. (No, that's no typo.) The lingo is that the function is an "exit handler", and you need to "register" the exit handler in order to tell C's runtime system to call it before exit exits. You register a function as an exit handler using the atexit function in stdlib. This is a funny function, in that its argument is a function pointer. In fact, when you define a function foo, the name "foo" refers to a function pointer. It has a type, but a type that looks very different from most, because it musdt specify the return type as well as the number and types of the function's arguments. Let's look at atexit's prototype:

int atexit(void (*func)(void));

The return type is int, that's clear, but what does the parameter "void (*func)(void)" tell us? It says that the parameter func is a function that takes no arguments and returns nothing. So, here's a simple function that fits the bill:

void foo() { fprintf(stderr,"I'm dying ...\n");

This function takes no arguments and returns nothing, so you can pass foo as an argument to atexit. For example:

 #include <stdlib.h>
 #include <stdio.h>

  int main()
  {
    atexit(foo);
    return 0;
  }

If I run this program, I get I'm dying ... as output. Since the function exit calls the exit handlers, can you see why it'd be bad to call exit from within an exit handler like foo?

One question that arises is this: if I register several handlers, in what order do they get called when the program terminates? They are guaranteed to be called in the reverse of the order in which they were registered.

p2.c Compiling & running

#include <stdio.h> #include <stdlib.h> void foo() { fprintf(stderr,"foo says bye.\n"); } void bar() { fprintf(stderr,"bar says bye.\n"); } int main() { atexit(foo); atexit(bar); sleep(2); return 0; }

bash$ gcc -o p2 p2.c bash$ ./p2 bar says bye. foo says bye.

_exit (and _Exit)

The funny thing about exit is that it is listed both as a system call and a C standard library routine. However, it is truly a C standard library routine since it deals with C sdio buffers. In fact, exit makes a system call to _exit to finally terminate a process. (_Exit is functionally equivalent). _exit also does cleanup, but what it cleans up is related to the kernel's bookkeeping related to the process, not the C library's bookkeeping. We haven't learned much about the kernel yet, so what _exit does do is something we'll talk about later. Intstead we focus on what it doesn't do: it doesn't call exit handlers, and it doesn't flush stdio buffers!

p3.c Compiling & running

#include <stdio.h> #include <stdlib.h> void foo() { fprintf(stderr,"foo says bye.\n"); } void bar() { fprintf(stderr,"bar says bye.\n"); } int main(int argc, char **argv) { atexit(foo); atexit(bar); sleep(2); fprintf(stdout,"Oops ... forgot a newline!"); if (argc > 1 && strcmp(argv[1],"exit") == 0) exit(0); if (argc > 1 && strcmp(argv[1],"_exit") == 0) _exit(0); if (argc > 1 && strcmp(argv[1],"_Exit") == 0) _Exit(0); return 0; }

bash$ gcc -o p3 p3.c bash$ ./p2 bar says bye. foo says bye. Oops ... forgot a newline!bash$ bash$ ./p2 exit bar says bye. foo says bye. Oops ... forgot a newline!bash$ bash$ ./p2 _exit bash$ bash$ ./p2 _Exit bash$