Class 14: Concurrency I

Reading: APUE Sections 11.1-11.4, 11.6 pp 368-378.
Homework: Printout the Homework and answer the questions on that paper.

Processes and threads

We've looked at systems that consist of more than one process cooperating to provide some service: the X's printed to the screen example, the plotter program etc. These are called concurrent programs, because the different processes run, conceptually if not in actuality, at the same time. Dividing an application into distinct communicating processes is nice. Each process is completely independent in the following sense: their memory spaces are completely separate. The stack, the heap and static areas of memory for each process are not simply separate, but actually protected from one another. Process A cannot touch Process B's memory. The OS and hardware enforce this. That's why Google Chrome assigns each tab its own process. One tab crashing won't affect the other tabs.

The strength of the multiprocess architecture is also sometimes a weakness. Sharing data amongst processes is a bit of a pain, because you can't simply share variables. There is another mechanism for producing concurrent programs: threads. A family of threads is like a set of processes that all share the same memory. Each thread has its own stack, but even those addresses are accessible to other threads in the same process ... though it'd be inviting disaster to try. But threads share the same heap, initialized and uninitialized global, and text segments. We'll look a bit more closely at multi-threaded (as opposed to multi-process) programs and learn some things not just about threads, but about concurrent programming in general.

pthreads

The implementation of threads we'll be using is called pthreads - for "posix threads". You'll need to include pthread.h for most everything we do with threads. Any function whose prototype looks like void* foo(void *p); can be a thread; meaning that if you call the function the right way, it'll go off and do its computing while, at the same time, the calling code continues doing its thing. Calling foo "the right way" means calling it via the function pthread_create. Each thread has its own thread id, which is a concept distinct from that of a process id, since a process may actually contain many threads. The thread id type is pthread_t.

int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void  *(*start_routine)(void*), void * arg);
                               ^                       ^                     ^                           ^      
                               |                       |                     |                           | 
                               |                       |                     |                           | 
                  thread id written here   we'll always be using NULL   thread function    arguments for thread function

Notice that the only argument for the thread function is a void*, that means you need to put any information it needs in an array or struct and give a pointer to that object.

As our first example, lets do something that was a bit of a pain with multiple processes: We have a program that prints X's forever, and whenever the user enters a different letter, prints that forever instead.

#include <unistd.h>
#include <stdio.h>
#include <pthread.h>

/* Global variable's are shared amongst threads! */
char c = 'X';

/* The thread we create will execute this function. */
void* listenerThread(void * p)
{ 
  while(scanf(" %c",&c) == 1 && c != 'q')
    ;
  exit(0); /* calling exit in a thread exits the whole process! */
}

int main()
{
  /* Create the listener thread */
  pthread_t lt;
  pthread_create(&lt,NULL,listenerThread,NULL);

  /* Write character c forever! */
  while(1)
  {
    printf("      %c",c);
    fflush(stdout);
    usleep(250000);
  }
  return 0;
}

This program has two threads: one for printing the character over and over, and one waiting for scanf to give new character values. It's worth noting that a call to exit (or _exit) in any thread terminates the whole process. There is a function ptheread_exit to exit from just one thread, rather than the whole process. This version is a little lame, because the user needs to hit return, and the character he enters gets echoed to the screen with the key press. Check out this version, which actually changes the terminal settings to fix those problems. Notice how with threads, communication is easy because both threads can access the global variable c.

A non-trivial threads application ... and a pitfall

One of the most common uses of threads is to improve responsiveness in user interfaces. For instance, if you ask for something to be saved to a file, you shouldn't need to wait for the whole thing to be written before you go back and, for instance, click on the "Print" button. Let's create a very simple scenario where this comes into play.

The utility ping is very useful. You give it a hostname - roughly speaking the name of a computer - and it tells you whether that computer is alive, meaning answering you. Imagine a simple shell that you enter hostnames into, being prompted before each hostname, and then, when you respond with "." prints out lists of alive vs. dead hosts. Let's consider a program that does just that:

#include <unistd.h>
#include <stdio.h>

/* Global variable's are shared amongst threads! */
char livelist[1026] = {'\n','\0' };
char deadlist[1026] = {'\n','\0' };

void* pingit(void * p)
{ 
  /* build string "ping " + hostname + " > /dev/null" */
  char comm[256];
  strcat(strcat(strcpy(comm,"ping "),(char*)p)," > /dev/null 2>/dev/null");

  /* Ping! on 0 return value, add host to list. */
  int res = system(comm); /* Runs comm-string in a shell & gets return value. */
  strcat(strcat( (res ? deadlist : livelist) ,(char*)p),"\n");
  return NULL;
}

int main()
{
  char host[256];

  /* Read host names & ping threads each of them. */
  while((printf("> "),scanf("%s",host) == 1) && strcmp(host,".") != 0)
  {
    pingit((void*)host);
  }

  /* Print lists. */
  printf("Live:%s",livelist);
  printf("Dead:%s",deadlist);

  return 0;
}

bash$ gcc -o go ex2nt.c
bash$ ./go
> deal.com
> foo.caltech.edu
> google.com
> foo.usna.edu
> bart.ru
> .
Live:
google.com
bart.ru
Dead:
deal.com
foo.usna.edu
foo.caltech.edu

If you play with this a bit you'll discover that, since ping may take a while, you sometimes have to wait until the prompt comes back. I.e. the interface is unresponsive. Let's have the pinging going on in separate threads while the main thread prints my prompt and reads my next input. After all, I don't need to wait around for an answer to keep entering more hostnames ... there's answer until the end! Here's a first crack.
WARNING! THIS PROGRAM HAS SERIOUS FLAWS!

#include <unistd.h>
#include <stdio.h>
#include <pthread.h>

/* Global variable's are shared amongst threads! */
int numResponses = 0;
char livelist[1026] = {'\n','\0' };
char deadlist[1026] = {'\n','\0' };

void* pingit(void * p)
{ 
  /* build string "ping " + hostname + " > /dev/null" */
  char comm[256];
  strcat(strcat(strcpy(comm,"ping "),(char*)p)," > /dev/null 2>/dev/null");

  /* Ping! on 0 return value, add host to list. */
  int res = system(comm);
  strcat(strcat( (res ? deadlist : livelist) ,(char*)p),"\n");
  numResponses = numResponses + 1;
  return NULL;
}

int main()
{
  int count = 0;
  char host[256];

  /* Read host names & create ping threads for each of them. */
  while((printf("> "),scanf("%s",host) == 1) && strcmp(host,".") != 0)
  {
    pthread_t lt;
    pthread_create(&lt,NULL,pingit,(void*)host);
    ++count;
  }

  /* Wait until all threads have completed, then print lists. */
  while(numResponses < count)
    /* do nothing */;
  printf("Live:%s",livelist);
  printf("Dead:%s",deadlist);

  return 0;
}

If you run this, you'll see that the output is seriously flawed. What happened. WAKE UP! THIS IS IMPORTANT! Data is shared between threads, and this is both a blessing and a curse! The character buffer host in main is passed to each thread. But main() continues even as the pingit thread continues. Pingit relies on host keeping its value, but main relies on reading something new into it on the next loop iteration. Thus, pingit is in trouble. After finishing its call to ping, it writes the value in host. If we're lucky and pingit has finished before main tries to read something new, all will be well. If we're unlucky and main reads something new before pingit finishes, pingit will be writing out the wrong value when it uses host. This is a classic example of a race condition. We have a race condition whenever the order in which different processes or threads execute affect the execution of a program. Or, more picturesquely, the speed at which they execute affects things.

The way to fix this is to have main dynamically allocate a new character buffer for each thread (which each thread must then deallocate). Here's a version of the program that does this, thus fixing the most glaring problem. However, be warned, the program below still has flaws ... they're just more subtle.

#include <unistd.h>
#include <stdio.h>
#include <pthread.h>

/* Global variable's are shared amongst threads! */
int numResponses = 0;
char livelist[1026] = {'\n','\0' };
char deadlist[1026] = {'\n','\0' };

void* pingit(void * p)
{ 
  /* build string "ping " + hostname + " > /dev/null" */
  char comm[256];
  strcat(strcat(strcpy(comm,"ping "),(char*)p)," > /dev/null 2>/dev/null");

  /* Ping! on 0 return value, add host to list. */
  int res = system(comm);
  strcat(strcat( (res ? deadlist : livelist) ,(char*)p),"\n");
  numResponses = numResponses + 1;
  free(p);
  return NULL;
}

int main()
{
  int count = 0;
  char host[256];

  /* Read host names & create ping threads for each of them. */
  while((printf("> "),scanf("%s",host) == 1) && strcmp(host,".") != 0)
  {
    pthread_t lt;
    char *hostbuff = strcpy((char*)malloc(strlen(host)+1),host);
    pthread_create(&lt,NULL,pingit,(void*)hostbuff);
    ++count;
  }

  /* Wait until all threads have completed, then print lists. */
  while(numResponses < count)
    /* do nothing */;
  printf("Live:%s",livelist);
  printf("Dead:%s",deadlist);

  return 0;
}

Critical sections, mutual exclusion and mutex's

What is this more subtle problem of which I speak? Well, I have many threads in the above program. Each of them does their ping thing, but at the end they all do this:

  strcat(strcat( (res ? deadlist : livelist) ,(char*)p),"\n");
  numResponses = numResponses + 1;

This is a sensitive couple of steps. Suppose thread 1 and thread 2 are both going through those steps, running in parallel (which they may well be doing on our dual-core lab machines, and their instructions interleave as follows:

     thread 1                                     thread2
---------------------------------        --------------------------------
loads numResponses in register R 
increments register R
                                         loads numResponses in register S
                                         increments register S
                                         saves register S to numResponses
saves register R to numResponses

Well, when all is said and done, numResponses will have increased by one not two ... and our program won't work right. Once again, we have a race condition ... and this one's a whole bunch harder to get rid of. Moreover, livelist and deadlist have the same kind of issue. In the previous section, the shared memory in the buffer host caused a race condition, so we simply got rid of the shared memory. Each thread gets its own buffer and nothing's shared. In this instance, shared memory is once again causing a race condition, but there's no way to get rid of it. That memory being shared is central to our whole approach.

Whenever we have a resource that is shared amongst several processes/threads, like the numResponses/livelist/deadist global data, any chunk of code that accesses that resource is referred to as a critical section, meaning a chunk of code in which different processes/threads can interfere with one another. The critical section problem is to control access to the resources manipulated in critical sections to avoid problems like the above, and any acceptable solution must satisfy the following three criteria:

mutual exlusion (mutex) - that only one process/thread can be in a critical section with respect to the shared resource at the same time,
progress - that no process/thread that is not in a critical section with respect to the shared resource can prevent other processes/threads from entering their critical sections, and
bounded wait - that any process/thread that wants to get into a critical section with respect to the shared resource eventually will enter that critical section.

Pthreads mutex

There is a special data-structure called a mutex that is used to regulate access to critical sections. It provides two operations: lock and unlock. Immediately before entering a critical section associated with mutex M, a process P tries to lock M --- in facier parlence: it tries to aquire a lock on M. If some other process has already aquired a lock on M, process P is suspended until the mutex is no longer locked, i.e. until the process that currently has the lock performs an unlockoperation on M. Pthreads implements mutexes, and as long as threads use it properly, by locking immediately prior to entering a critical section and unlocking immediately after that critical section, the mutex provides a solution to the mutual exclusion problem. Pthreads has a special type pthread_mutex_t for mutexes, and the following operations:

int pthread_mutex_init(pthread_mutex_t *mutex, const pthread_mutexattr_t *attr); ← the second argument will be NULL for our examples
int pthread_mutex_destroy(pthread_mutex_t *mutex);
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);

the pthread_mutex_t object you're using needs to be global, so all the threads can access it. Below is a good solution to our ping problem. It uses mutexes to ensure that only one of the ping-processes can execute its critical section --- the two lines that maniplulate numResponses, deadlist and livelist --- at any given time.

#include <unistd.h>
#include <stdio.h>
#include <pthread.h>

/* Global variable's are shared amongst threads! */
pthread_mutex_t mu;
int numResponses = 0;
char livelist[1026] = {'\n','\0' };
char deadlist[1026] = {'\n','\0' };

void* pingit(void * p)
{ 
  /* build string "ping " + hostname + " > /dev/null" */
  char comm[256];
  strcat(strcat(strcpy(comm,"ping "),(char*)p)," > /dev/null 2>/dev/null");

  /* Ping! on 0 return value, add host to list. */
  int res = system(comm);
  pthread_mutex_lock(&mu);   /* <-- AQUIRE LOCK, BEGIN CRITICAL SECTION   */
  strcat(strcat( (res ? deadlist : livelist) ,(char*)p),"\n");
  numResponses = numResponses + 1;
  pthread_mutex_unlock(&mu); /* <-- END CRITICAL SECTION, RELENQUISH LOCK */
  free(p);
  return NULL;
}

int main()
{
  pthread_mutex_init(&mu,NULL);
  int count = 0;
  char host[256];

  /* Read host names & create ping threads for each of them. */
  while((printf("> "),scanf("%s",host) == 1) && strcmp(host,".") != 0)
  {
    pthread_t lt;
    char *hostbuff = strcpy((char*)malloc(strlen(host)+1),host);
    pthread_create(&lt,NULL,pingit,(void*)hostbuff);
    ++count;
  }

  /* Wait until all threads have completed, then print lists. */
  while(numResponses < count)
    /* do nothing */;
  printf("Live:%s",livelist);
  printf("Dead:%s",deadlist);
  
  pthread_mutex_destroy(&mu);

  return 0;
}

Next big question: How do these mutexes work? How are they implemented?