Class 13: IPC with pipes

Reading: To Appear.
Homework: Have a nice spring break!

Inter-process commuunication (IPC)

As we start to consider systems that consist of multiple interacting processes, the question of how processes communicate becomes extremely important. There are several mechanisms for Inter-Process Communication (IPC) - some of which we've already seen. All of the below are examples.

signals
regular files
FIFOS (named pipes)
pipes
sockets
shared memory
semaphores

Signals and files are two mechanisms we're already familiar with. Signals allow for asynchronous communication between processes, but of a very limited nature. A signal communicates that an event has occured, but nothing else. There's no opportunity for sending additional data concerning the event. Two processes can, of course, communicate by having one process write to a file while the other reads from the file. However, this is a bit difficult to pull off because the two processes must synchronize the read/writes. If the reading process tries to read beyond the point of the writers last write, it gets an end-of-file. So the two processes must ensure the reader waits until the writer has written some date before it reads. In Class 12 we went through some lengths to combine signals and files to communicate something very simple: that the user changed the character he wanted to display. Another drawback to using regular files to communicate is that it requires actually writing data to disk, and that is really slow compared to ... well compared to almost anything else the computer does. Moreover, the file that you use is part of the file system, so it has a name (which must be agreed upon ahead of time somehow) and there is the danger that other processes might read from it, write to it, or even delete it. Finally, it's very difficult to have multiple instances of the system running simultaneously, because if they don't all use different filenames, they'll interfere with each other.

FIFOs (named pipes)

FIFOS (also called named pipes) are a mechanism that allow for IPC that's similar to using regular files, except that

the kernel takes care of synchronizing reads and writes, and
data is never actually written to disk (instead it is stored in buffers in memory) so the overhead of disk I/O (which is huge!) is avoided.

A FIFO is part of the file system, i.e. it has a name and path just like a regular file. Programs can open it for reading and writing, just like a regular file. However, the name is simply a convenient reference for what is actually just a stream of bytes, with no persistent storage or ability to move backwards of jump forward in the stream. Each byte written is read exact once in a First-In-First-Out fashion: hence the name FIFO. If a read is performed on a FIFO and the writer has not yet written a byte, the kernal actually puts the reading process to sleep until data is available to read (just as it does when you read from the terminal). If the writer has written so much data that the FIFO's buffer is full, it is put to sleep until some reader process has read some bytes, thereby making room in the buffer.

FIFO's can be created from the command-line with the mkfifo utility. Here's a fun game to try to see fifos at work:

Open up two terminals (we'll refer to them as terminal 1 and terminal 2) with both in your home directory.
In terminal 1, give the command mkfifo foo to create a fifo foo in your home directory.
In terminal 1, give the command cat > foo
In terminal 2, give the command tr ' ' 'x' < foo
In terminal 1 type a line like: the rain in spain falls mainly on the plain, hit enter and see what happens in terminal 2.
Try a few more lines out, then give ctrl-d in terminal 1. See what happens?
If you do ls foo, you'll see that foo is still there. Go ahead and remove it with rm.

FIFOs can be used for IPC (as indeed it was in the above example), but they suffer from some of the same problems as IPC with regular files, even as they address others. We still need to agree on a name ahead of time and ensure that the communicating processes know what it is. And the FIFO is still a name in the file system, so we have the same problem that several processes using the same FIFO might interfere with one another, and that if someone removes the FIFO, it could be catastrophic for processes that depend on that FIFO for communication.

Pipes

The oldest mechanism for IPC in Unix is pipes. We've initiated IPC-via-pipes from the command-line almost from day one using "|". Now we'll see what actually goes on when we do that. Pipes are like FIFOs without the name. They have the same first-in-first-out behavior, but have no connection whatsoever to the filesystem. They are created with the system call pipe which provides the calling program with a file descriptor with which to write to the pipe, and a file descriptor to read from the pipe. It is very important to understand that a pipe is one way: it has a read and and a write end.

So how do we end up with two processes sharing a pipe if it has no name? If all we have to reference the pipe with are a pair of file descriptors? Well ... we have to call pipe and then fork! After all, related processes (e.g. child-parent or sibling processes) share file descriptors.

Pipe's syntax takes a bit of explanation. Here's the prototype:

     #include <unistd.h>

     int pipe(int fildes[2]);

Pipe takes an array of two ints (two file descriptors) and, if the kernel succeeds in creating the pipe, it puts the file descriptor for the reading end of the pipe in the 0th entry, e.g. filedes[0], and it puts the file descriptor for the write end of the pipe in the 1st entry, e.g. filedes[1]. Pipe returns 0 if successful and -1 otherwise.

Here's an extremely important point: a read from a pipe only gives end-of-file if all file descriptors for the write end of the pipe have been closed. Thus, after a fork, whichever process is intending to do the reading (and thus not the writing) had best close the write end of the pipe! Forewarned is forearmed!

Let's do the example we just did with FIFOs with pipes. Here it is:

#include <unistd.h>

int main()
{
  int pfd[2], fv;
  pipe(pfd);
  fv = fork();
  if (fv)
  {
    close(pfd[0]);
    dup2(pfd[1],STDOUT_FILENO);
    execlp("cat","cat",NULL);
  }
  else
  {
    close(pfd[1]);
    dup2(pfd[0],STDIN_FILENO);
    execlp("tr","tr"," ","x",NULL);
  }

  return 0;
}

Important point: comment out the lines that close the unneeded ends of the pipe, recompile, and run. Looks like it runs just fine, right? Wrong! If you do a ps you'll see that the tr process is still running. Why? Because it's reading from a pipe whose write end is still open in some process ... namely the tr process itself. So close those unneeded pipe ends!
Second point: We really should've put in a check for failure in the calls to pipe and fork.
Third point: Writing to a pipe whose read end is closed (i.e. no process has an open descriptor to its read end) causes a SIGPIPE signal to be sent.
Fourth point: Pipes provide synchronous I/O for IPC unlike signals. Processes communicating via pipes must be running on the same host, i.e. processes on different computers cannot communicate via pipes. We'll later learn about sockets that allow for pipe-like communication between processes on different hosts.

Plotter revisited

Check out new and improved plotter for some fun with pipes!

Using C I/O routines with pipes

The pipe system call returns two file descriptors, one for the reading end and one for the writing end of the pipe. With a file-descriptor, we use the read and write system calls to actually "do I/O". If we want to use C's I/O routines, i.e. if we want fscanf and fprintf, we need to have a FILE* for the read end and for the write end. Recall that the fdopen C library function does this for us!