A thread is a portion of the program that can be scheduled independently of the larger program. It is a way to divide a single process into sub-runnable parts that each can be scheduled by the OS. This concept should already be somewhat familiar from the world of processes — processes are separate independent programs that can be scheduled independently — a thread is similar in that it provides a further division of schedulable units of a process that the OS can choose to run.
A single process can have multiple threads, and each thread runs independently, but resources are shared across threads, generally, by default. For example, threads share memory: two threads have access to the same set of variables and can alter each other's variable values. Threads are also independently scheduled by the OS.
Advantages. Why are threads desirable, when we can simply use multiple processes, instead? There are two key reasons:
Resource Sharing. Processes generally do not share resources and address space with other processes by default (unless the resources were created prior to a fork, in which case copies of some resources like file handles, signal handlers, etc. might be shared by a parent and child process). Multiple processes that were created independently, but need to work collaboratively on the same resources, often need to share them via explicit coordination. Multiple threads belonging to one process share an address space and resources by default, which can greatly simplify the programming model with regard to access to shared resources.
Creation Speed. It's faster to create a new thread, compared to creating a new process. Example reference.
Switching Speed. Threads are much faster not only to create and destroy, but also to switch between, compared to processes. The performance benefit varies based on a large number of factors, but can be one to two orders of magnitude (10-100 times) in some cases. The performance argument is more true for programs that are not CPU-bound (i.e., they block frequently, requiring a context switch), compared to programs which are CPU-bound (tend not to block).
Key Difference. Generally speaking, a process is a unit of protection and resource allocation, whereas a thread is simply a unit of execution. Threads generally share all the resources of their parent process. According to the Tanenbaum text, "Threads allow multiple executions to take place in the same process environment." Threads are sometimes referred to as "lightweight processes," but henceforth we will only use the term threads.
The following figure illustrates how multithreading differs from multiprocessing:
Example (a) on the left shows three processes, each with one thread. Example (b) on the right shows a single process with three threads. (Tanenbaum)
Can a multithreaded program run on a single CPU? Absolutely. In that case, the threads can take turns executing.
Protection. Since threads share resources, including allocated memory, it is possible for one thread to overwrite another's work. There is no protection between threads. In theory, they should cooperate, because they derive from the same process. In practice, this leads to very complicated issues of coordination, which we will discuss later.
Process vs. Thread. The following table lists some items that are usually created on a per-process basis, and some that are created on a per-thread basis.
Per-process items (left) and per-thread items (right)
States. Like a process, a thread can be in any one of three states: Ready, Running, or Blocked:
A thread can be in the running, blocked, or ready state.
OS vs. Library Implementation. Generally, threads can be implemented in the kernel, as a library in user space, or using a combination of user-space and kernel-space constructs. For Unix/Linux specifically, threads are not supported "out of the box" (although the Linux kernel, for example, does have constructs that help facilitate thread implementation.) Instead, threads are generally implemented by a user-space library based on the POSIX thread (pthread) standard. The standard covers how threads should be created, run, and terminated. For the most part, POSIX threads have the advantage of being portable across other POSIX-compliant systems.
User-Level Threads (ULTs)
If threads are implemented entirely in user space ('Pure ULTs'), the kernel knows nothing about them. The kernel is only aware of single-threaded processes. The obvious advantage is portability -- they can be implemented on an OS that has no built-in support for threads.
Thread Table. Each process needs its own private thread table to keep track of the the threads in that process. The thread tables are managed by the runtime system (in the case of pthreads, by the pthreads library calls).
User-level threads (ULTs).
ULT Thread Switch. Tanenbaum text: "When a thread does something that may cause it to become blocked locally, for example, waiting for another thread in its process to complete some work, it calls a runtime system procedure. The procedure checks to see if the calling thread must be put into a blocked state." If so, it saves the thread's context and selects a new thread (from the same process) to run. This can usually be done quickly, much faster than a full process switch. No switch to the kernel is required.
Advantages of ULTs:
As mentioned in the previous paragraph, switching between threads within a process can be very fast, much faster than executing a process switch
If desired, each process can have its own custom scheduling algorithm.
They typically scale better than kernel-mode threads.
Disadvantages of ULTs:
The biggest disadvantage is that many system calls will result in the entire process being blocked, including all of the threads. That is, the action that blocks a single thread ends up blocking all the threads in the process. This can be avoided if system calls can be made nonblocking, or if a check can be performed ahead of time if a system call would block. These strategies either require modifying the OS (which user-mode threads were trying to avoid in the first place), or accepting a significant penalty in complexity and performance.
A virtual memory page fault will block all threads in a process. There is no straightforward way to make this a non-blocking event for the process.
A single thread's execution can run unchecked, because there are no hardware clock interrupts within the execution of one process. The threads must be programmed intelligently to yield to other threads.
Tasks that are CPU bound or have a high inter-dependence among the threads (and hence block very frequently) do not tend to benefit.
As a result of the performance cost of repeated mode switches to the kernel, most thread implementations are not implemented purely in user space. Rather, they are implemented in the kernel only, or as a hybrid between kernel and user space.
Kernel-Level Threads (KLTs)
If threads are implemented entirely in the kernel instead ('Pure KLT'), the kernel must maintain the thread table, which tracks all threads and what processes they belong to.
Kernel-level threads (KLTs).
When a thread blocks, the kernel, at its option, chooses another thread from that same process to run, or a thread from another process. Unlike ULTs, with KLTs all actions that might block a thread are implemented as system calls.
Advantages of KLTs. Compare to ULTs, KLTs have several advantages:
KLTs require no user-space library.
The kernel can simultaneously schedule more than one thread from a process, if the processor supports it. NOTE: this is an important distinction, since the vast majority of modern CPUs (including your laptop) are multi-core.
KLTs do not require any new, nonblocking system calls to be written.
Disadvatages of KLTs:
ULTs may sometimes be able to avoid a block by having the user-level runtime system handle some potentially-blocking calls, leading to a faster thread switch (without going to the kernel). Pure KLTs are not designed to do this -- a thread switch always leads to a switch to the kernel, which increases overhead.
The kernel must decide how to handle a fork() operation by a multithreaded process. Does the child process have one thread or copies of all threads?
The kernel must have a mechanism for relaying SIGNALs, when they come into a process, to the appropriate thread or to all threads.
KLTs are simpler and more elegant to implement than ULTs, but are slower. Hybrid systems include both a user-space and a kernel-space component. In this approach, threads in user space are somehow mapped to threads in kernel space. The kernel is only aware of the kernel-level threads, and schedules those. Some kernel threads may have multiple user-level threads multiplexed on top of them.
A combined approach.
The advantage of the combined approach is that it's the most flexible. The disadvantage is that it requires design support from the OS and support from a user space library, which must be compatible.
At one time, hybrid approaches were moderately popular due to their performance benefits. For instance, the default thread implementation
on Solaris (until v10) and some earlier version of pthreads all used the hybrid approach.
However, since that time almost all OS's have shifted to using KLTs instead, including Windows and pthreads on Linux, in order to obtain
reliable performance across a wide range of situations.
As one kernel hacker described,
"It's not that M:N [hybrid KLT/ULT] systems weren't performant in certain benchmarks, it's mostly that it was extremely hard to make the system reliable under all circumstances, which a general-purpose OS must be."
Recall that there are a number of difficult design decisions when a process has multiple threads, including:
Process Fork. When a process forks:
Does the child process get copies of all parent threads, copies of none, or somewhere in between?
If a thread in the parent process was blocked, does the cloned thread in the child process start out blocked?
What if a thread in the parent process was holding a mutex, semaphore, or other synchronization object?
File I/O. What if a parent thread is in the middle of file I/O, then the cloned thread executes an lseek to change the file pointer. What happens to the original pointer?
Signal Handling. Should signals be directed to a process or to individual threads? Or to groups of threads?
These are difficult questions. To try to handle some of them, Linux deviates from the traditional Unix standard (POSIX) in two ways:
Cloning and Resources. First, Linux introduces a system call, clone(), which is not part of the Unix standard:
This call creates a new thread, either in the current process or in a new process, depending on the sharing_flags. If the new thread is in the current process, it shares the address space with the existing threads. If not, it gets a copy of the old address space, similar to what happens when fork() creates a new process. Unlike fork(), though, certain resources of the parent process can be explicitly shared or not shared, as shown in the following table. This fine-grained resource-sharing capability answers some of the questions raised above, but at the expense of portability, since it is not part of the Unix standard.
The sharing_flags bitmap.
clone() also supports many other options; see man clone.
Tasks and Thread ID. The terminology here can definitely be a little confusing. In addition, we'll describe things from the users/programmers perspective, but it turns out that "under the hood" Linux actually uses different terms (for instance, PID means something different inside the kernel/scheduler than it does at the syscall level used by the programmer). From the user's/programmer's perspective:
For Linux, every independent unit of execution is considered to be a "task".
Technically, every task has a unique "thread" associated with it. A "task" and a "thread" are not literally the same thing -- but there's a 1 to 1 correspondence between them, so you can basically treat a task as a thread. For instance, If you look at the headers for Linux scheduler (sched.h), there are many instances of both the term thread and the term task, but they are distinguished.
A single-threaded process is therefore represented by just a single "task".
A multi-threaded process has multiple "tasks", all belonging to that one process (and thus all have the same PID). Formally, all threads in a process are considered to be part of the same "thread group", so you'll sometimes see references to the "thread group ID" (TGID), which is the same as the PID (from the programmer's perspective).
The PID is a standard part of Unix. Linux further adds a "thread ID" (TID), which uniquely identifies each thread (and implicitly identifies the task).
For scheduling purposes, Linux works off of the TID. However, the scheduler may be aware of which process (PID/TGID) each thread belongs to,
in order to ensure that each process gets a fair share of the CPU time, for instance when some but not all processes have many threads.
Threads are sometimes referred to as "light-weight processes" (LWP). Some command-line tools therefore refer to the TID as the "LWP" number of the thread.
For our purposes, in a multithreaded Linux environment, we can thus consider a task and a thread to be analogous, and remember the TID represents the thread_id.
Summary / Key Points
Compared to multiprocessing on the same problem, multithreading is much faster. Threads are faster to create, destroy, and switch between.
A process is a unit of resource allocation and protection, while a thread is a unit of execution.
Like a process, a thread may be in the Running, Ready, or Blocked state.
Threads of a process all share the process' resources, generally speaking.
The main limitation of ULTs is that a syscall by one thread can block all the threads of the process
Each thread must have its own "context" (registers and flags), state, and executable stack.
The main limitations of KLTs are the fact that a thread switch requires a mode switch to the kernel (unlike ULTs), and the difficult design decisions related to forks, signals, resource sharing, and synchronization.
Linux uses a KLT approach to thread implementation. The POSIX standard pthread library is employed in user space, and threads are identified by a unique TID. However, the kernel space implementation suports advanced (optional) features that are not compatible with traditional Unix.