Threads

Threads

Reading

Intro

A thread is a portion of the program that can be scheduled independently of the larger program. It is a way to divide a single process into sub-runnable parts that each can be scheduled by the OS. This concept should already be somewhat familiar from the world of processes — processes are separate independent programs that can be scheduled independently — a thread is similar in that it provides a further division of schedulable units of a process that the OS can choose to run.

A single process can have multiple threads, and each thread runs independently, but resources are shared across threads, generally, by default. For example, threads share memory: two threads have access to the same set of variables and can alter each other's variable values. Threads are also independently scheduled by the OS.

Advantages. Why are threads desirable, when we can simply use multiple processes, instead? There are two key reasons: Key Difference. Generally speaking, a process is a unit of protection and resource allocation, whereas a thread is simply a unit of execution. Threads generally share all the resources of their parent process. According to the Tanenbaum text, "Threads allow multiple executions to take place in the same process environment." Threads are sometimes referred to as "lightweight processes," but henceforth we will only use the term threads.

Implementation

The following figure illustrates how multithreading differs from multiprocessing:
Example (a) on the left shows three processes, each with one thread. Example (b) on the right shows a single process with three threads. (Tanenbaum)
Can a multithreaded program run on a single CPU? Absolutely. In that case, the threads can take turns executing.

Protection. Since threads share resources, including allocated memory, it is possible for one thread to overwrite another's work. There is no protection between threads. In theory, they should cooperate, because they derive from the same process. In practice, this leads to very complicated issues of coordination, which we will discuss later.

Process vs. Thread. The following table lists some items that are usually created on a per-process basis, and some that are created on a per-thread basis.
Per-process items (left) and per-thread items (right)
States. Like a process, a thread can be in any one of three states: Ready, Running, or Blocked:
A thread can be in the running, blocked, or ready state.
OS vs. Library Implementation. Generally, threads can be implemented in the kernel, as a library in user space, or using a combination of user-space and kernel-space constructs. For Unix/Linux specifically, threads are not supported "out of the box" (although the Linux kernel, for example, does have constructs that help facilitate thread implementation.) Instead, threads are generally implemented by a user-space library based on the POSIX thread (pthread) standard. The standard covers how threads should be created, run, and terminated. For the most part, POSIX threads have the advantage of being portable across other POSIX-compliant systems.

User-Level Threads (ULTs)

If threads are implemented entirely in user space ('Pure ULTs'), the kernel knows nothing about them. The kernel is only aware of single-threaded processes. The obvious advantage is portability -- they can be implemented on an OS that has no built-in support for threads.

Thread Table. Each process needs its own private thread table to keep track of the the threads in that process. The thread tables are managed by the runtime system (in the case of pthreads, by the pthreads library calls).
User-level threads (ULTs).
ULT Thread Switch. Tanenbaum text: "When a thread does something that may cause it to become blocked locally, for example, waiting for another thread in its process to complete some work, it calls a runtime system procedure. The procedure checks to see if the calling thread must be put into a blocked state." If so, it saves the thread's context and selects a new thread (from the same process) to run. This can usually be done quickly, much faster than a full process switch. No switch to the kernel is required.

Advantages of ULTs: Disadvantages of ULTs: As a result of the performance cost of repeated mode switches to the kernel, most thread implementations are not implemented purely in user space. Rather, they are implemented in the kernel only, or as a hybrid between kernel and user space.

Kernel-Level Threads (KLTs)

If threads are implemented entirely in the kernel instead ('Pure KLT'), the kernel must maintain the thread table, which tracks all threads and what processes they belong to.
Kernel-level threads (KLTs).
When a thread blocks, the kernel, at its option, chooses another thread from that same process to run, or a thread from another process. Unlike ULTs, with KLTs all actions that might block a thread are implemented as system calls.

Advantages of KLTs. Compare to ULTs, KLTs have several advantages: Disadvatages of KLTs:

Hybrid Approaches

KLTs are simpler and more elegant to implement than ULTs, but are slower. Hybrid systems include both a user-space and a kernel-space component. In this approach, threads in user space are somehow mapped to threads in kernel space. The kernel is only aware of the kernel-level threads, and schedules those. Some kernel threads may have multiple user-level threads multiplexed on top of them.
A combined approach.
The advantage of the combined approach is that it's the most flexible. The disadvantage is that it requires design support from the OS and support from a user space library, which must be compatible.

At one time, hybrid approaches were moderately popular due to their performance benefits. For instance, the default thread implementation on Solaris (until v10) and some earlier version of pthreads all used the hybrid approach. However, since that time almost all OS's have shifted to using KLTs instead, including Windows and pthreads on Linux, in order to obtain reliable performance across a wide range of situations. As one kernel hacker described, "It's not that M:N [hybrid KLT/ULT] systems weren't performant in certain benchmarks, it's mostly that it was extremely hard to make the system reliable under all circumstances, which a general-purpose OS must be."

Linux Threads

Recall that there are a number of difficult design decisions when a process has multiple threads, including:

These are difficult questions. To try to handle some of them, Linux deviates from the traditional Unix standard (POSIX) in two ways:

Cloning and Resources. First, Linux introduces a system call, clone(), which is not part of the Unix standard:
pid = clone(function, stack_ptr, sharing_flags, args);
This call creates a new thread, either in the current process or in a new process, depending on the sharing_flags. If the new thread is in the current process, it shares the address space with the existing threads. If not, it gets a copy of the old address space, similar to what happens when fork() creates a new process. Unlike fork(), though, certain resources of the parent process can be explicitly shared or not shared, as shown in the following table. This fine-grained resource-sharing capability answers some of the questions raised above, but at the expense of portability, since it is not part of the Unix standard.
The sharing_flags bitmap.
clone() also supports many other options; see man clone.

Tasks and Thread ID. The terminology here can definitely be a little confusing. In addition, we'll describe things from the users/programmers perspective, but it turns out that "under the hood" Linux actually uses different terms (for instance, PID means something different inside the kernel/scheduler than it does at the syscall level used by the programmer). From the user's/programmer's perspective: For our purposes, in a multithreaded Linux environment, we can thus consider a task and a thread to be analogous, and remember the TID represents the thread_id.

Summary / Key Points