Threads II

POSIX Threads

Intro

In 1995, POSIX became the standard interface for many system calls in UNIX, including the threading environment. As a result, they are the model for programming with threads in nearly every OS and runtime environment, in systems based on C, Java, Python and other languages.

The lifecycle of a thread, much like a process, begins with creation. But, threads are not forked from a parent to create a child, instead they are simply created with a starting function as the entry point. A thread does not terminate, like a process; instead, threads are joined with the main thread when complete, or they are detached to run on their own until completion. Some common pthreads library function calls are listed in the following table:

The standard implementation of POSIX-compliant multithreading on Unix/Linux systems uses the pthread library. It is not the only possible way to implement multithreading, but it is the de facto standard on Unix/Linux systems. The following table lists some common functions calls implemented by the pthreads library.
Some pthread function calls.

Compilation

To compile a program with the pthread library, first we must include the header file:
#include <pthread.h>
This provides access to the underlying data types, like pthread_t, and function declarations. However, this is not enough because pthreads are not part of the standard C library. Instead, we must also explicitly link the pthreads library at compilation:
$ gcc -o hello hello.c -lpthread
where the -lpthread option tells gcc to link against the POSIX thread library.

Creation

The function for creating a pthread is defined as:
int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
                   void *(*start_routine) (void *), void *arg);
This creates a new thread in the calling process. The thread is identified by the type pthread_t, and can have a set of attributes (unused here). Next is a function pointer start_routine. This is the function that gets called when the thread begins execution. The next argument, arg, passes arguments to start_routine.


Example. Thread creation is demonstrated in the following example: code
/* hello.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>

void * hello_fun(void * args){

  printf("Hello World!\n");

  return NULL;
}

int main(int argc, char * argv[]){

  pthread_t thread;  //thread identifier

  //create a new thread have it run the function hello_fun
  pthread_create(&thread, NULL, hello_fun, NULL);

  //wait until the thread completes
  pthread_join(thread, NULL);

  return 0;
}

Argument Passing

This next example illustrates passing arguments to threads: code
/* arg.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>

void * hello_arg(void * args){

  char * str = (char *) args;
  printf("%s", str);
  return NULL;
}

int main(int argc, char * argv[]){

  char hello[] = "Hello World!\n";

  pthread_t thread;  //thread identifier

  //create a new thread that runs hello_arg with argument hello
  pthread_create(&thread, NULL, hello_arg, hello);

  //wait until the thread completes
  pthread_join(thread, NULL);

  return 0;
}
The function hello_arg, when created, takes as its 4th input argument hello, a string containing the phrase "Hello World!". Note that in the pthread_create definition, it specifies a void * argument type for the 4th argument, which allows any pointer type to be passed in to the tthread. We use pointer casting to avoid a type mismatch. In the hello_arg function, when the character pointer str gets pointed to args, we use (char *) to cast it to the proper pointer type, avoiding a type warning from the subsequent printf statement.

Joining

Just like with processes, it is often useful to be able to identify when a thread has completed or exited. The method for doing this is to join the thread, which is a lot like the wait() call for processes. Joining is a blocking operation, so the calling thread will not continue until after the thread identified has changed states.

Typically, only the main thread calls join(), but other threads can also join each other. All threads are automatically joined when the main thread terminates. For example, the following code produces no output: code
/* no_output.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>

void * hello_fun(){
  printf("Hello World!\n");
  return NULL;
}

int main(int argc, char * argv[]){

  pthread_t thread;
  pthread_create(&thread, NULL, hello_fun, NULL);

  // Need a pthread_join here to wait for hello_fun to complete
  
  return 0;
}
The program failed to join the new thread before the main thread terminated. As a result, the thread was automatically joined and did not have a chance to print "Hello World".

Return Values

A thread can also pass a return value, much like an exit status for processes, except it can be of any type, not just an integer. This happens during the join:
int pthread_join(pthread_t thread, void **retval);
Here is an example: code
/* return.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>

void * hello_return(void * args){
  // strdup allocates on the heap
  char * hello = strdup("Hello World!\n");
  return (void *) hello;
}

int main(int argc, char * argv[]){

  char * str;
  pthread_t thread; 

  pthread_create(&thread, NULL, hello_return, NULL);  // No input args
  pthread_join(thread, (void **) &str); // Return value now pointed to by str
  printf("%s", str);
  free(str);

  return 0;
}
The hello_return() function returned a void *, which is really a reference to the string containing "Hello World!". After the join, str points to the string created by the thread. Since it was created by strdup it was allocated on the heap, not the stack.

Threads and the Linux OS

While we like to describe pthreads as a user-level construct, there is also support at the OS level. The POSIX environment just standardizes the user interface so code can be consistent across operating systems. Thus, pthreads is a convenient front-end onto the Linux backend, which uses a KLT implementation for threads.

In Linux, pthreads are implemented using the clone() system call. clone() is a lot like fork(), but has more options, including sharing memory and creating kernel threads. This enables threads to be scheduled and treated by the OS much like processes.

Thread Identifiers

When identifying a process, we use its PID. We've seen POSIX threads identified by their pthread_t, which is part of the POSIX implementation. While pthread_t identifiers are necessary for working with the pthread library, they are long and not convenient to work with. Instead, we will often assign each thread a user level identifier, like a number, such as thread 1, thread 2, etc.

Each pthread has an identifier, much like a PID, called the thread ID, or TID. The traditional method for retrieving the TID is using the gettid() system call. Let's look at an example: code
/* tid.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/syscall.h>

pid_t gettid(){
  return (pid_t) syscall (SYS_gettid);
}

void * hello_fun(void * args){
  printf("THREAD: TID:%d PID:%d PthreadID:%lu\n", gettid(), getpid(), pthread_self());
  return NULL;
}

int main(int argc, char * argv[]){
  pthread_t thread;  // TID
  pthread_create(&thread, NULL, hello_fun, NULL);
  printf("MAIN:   TID:%d PID:%d \n", gettid(), getpid());
  pthread_join(thread, NULL);
  return 0;
}    
The output of this program is something like:
$ ./tid
MAIN: TID:21301 PID:21301 
THREAD: TID:21302 PID:21301 PthreadID:140378868139776
Conclusions:

Observing Threads in Execution

Let's a look at an example program that creates some threads that only busy-wait in an infinite loop: code
/* busy.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>

void * hello_fun(void * args){
  while(1){}
  return NULL;
}

int main(int argc, char * argv[]){
  
  pthread_t thread[4];
  int i;

  for(i = 0 ; i < 4; i++){
    pthread_create(&thread[i], NULL, hello_fun, NULL);
  }
  for(i = 0 ; i < 4; i++){
    pthread_join(thread[i], NULL);
  }

  return 0;
}
This program creates 4 threads, all of which just busy-wait. The main thread waits for the rest of the threads to complete. One question we might ask is, "how much of the CPU does this use?" Let's run this program to find out:
$ ./busy &
[1] 21322
$ ps
  PID TTY          TIME CMD
18344 pts/5    00:00:00 bash
21322 pts/5    00:00:06 busy
21327 pts/5    00:00:00 ps
If we just run the program in the background and look at the ps output, we see that the program is just one process. No information about the threads is provided. But, let's look at the top output instead:
Tasks: 204 total,   1 running, 198 sleeping,   5 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7862628k total,  4490728k used,  3371900k free,   194504k buffers
Swap: 12753916k total,       12k used, 12753904k free,  3641808k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21322 user      20   0 39240  380  296 S  394  0.0   4:06.83 busy
    1 root      20   0 24604 2548 1352 S    0  0.0   0:01.81 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.19 kthreadd
    3 root      20   0     0    0    0 S    0  0.0   0:08.67 ksoftirqd/0       
(...)
Look at the column for %CPU: 394%!!! That's because each of the threads is scheduled individually and is using resources as if it were a process. The machine running the program is multi-core, so each thread can run at the same time, and thus, one program is using 4 CPU cores, at nearly 100% utilization on each.

We can see this a bit better when we expand the program into its constituent threads, using ps -L.
$ ps -L
  PID   LWP TTY          TIME CMD
18344 18344 pts/5    00:00:00 bash
21322 21322 pts/5    00:00:00 busy
21322 21323 pts/5    00:03:50 busy
21322 21324 pts/5    00:03:50 busy
21322 21325 pts/5    00:03:50 busy
21322 21326 pts/5    00:03:50 busy
21333 21333 pts/5    00:00:00 ps
The -L option for ps will organize by both PID and TID (shown here as LWP, or "lightweight process"), so we can see the program is actually running as 5 entities (LWPs): the 4 busy threads and 1 more thread for the main program.

We can also look at the top output using the H option (hit H when top is up):
Cpu(s):  0.1%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7862628k total,  4490620k used,  3372008k free,   194552k buffers
Swap: 12753916k total,       12k used, 12753904k free,  3641820k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21323 user      20   0 39240  380  296 R  100  0.0   6:34.12 busy
21324 user      20   0 39240  380  296 R  100  0.0   6:34.13 busy
21325 user      20   0 39240  380  296 R  100  0.0   6:34.10 busy
21326 user      20   0 39240  380  296 R  100  0.0   6:34.10 busy
21322 user      20   0 39240  380  296 S    0  0.0   0:00.00 busy
Each new thread is using 100% of a single core (in state R, or "running"), while the main thread is blocking (in state S, or "sleeping"), waiting to join each of the other threads. The top output uses the tid as the pid for each thread.

Don't forget to kill the running program with killall -9 busy!

Summary / Key Points