We have discussed how processes and threads can share access to resources, using concurrency primitives like semaphores and monitors. The examples we discussed were relatively simple, compared to real-world applications. It turns out that realistically complex applications often need to use multiple resources at once to accomplish certain tasks. Doing so in a multiprocessing or multithreaded environment often leads to contention for multiple resource instances or resource types. When this occurs, a common result is deadlock. Users may experience this as the OS reporting that an application is unresponsive and must be closed, in the best case, or in the worst case the entire OS becomes unresponsive or crashes.
In this section, we discuss how deadlocks can occur, and explore methods for detecting and recovering from them. There currently does not exist an efficient, general solution for the deadlock problem in operating systems, but there are a number of techniques for at least detecting and recovering from a deadlock with reasonable efficiency.
Deadlock: the permanent blocking of a set of processes (or threads) that either compete for system resources or communicate with each other. Each process (or thread) in a deadlocked set is waiting on an event or resource that can only be triggered or freed by another member of the set.
Resources may include I/O devices, data records, files, named pipes, and really anything else that can be owned by a process. As with our IPC discussion, all the concepts outlined here apply equally to processes and threads, since both may deal with shared resources. Some resources may have several similar instances available (for example, two ethernet devices). In short, a resource is "anything that must be acquired, used, and released over the the course of time." (Tanenbuam)
There are two basic types or resource, pre-emptable and non pre-emptable. The type of a resource may depend, in part, on context.
A pre-emptable resource can be taken away from the process owning it (and later returned to it) with no ill effects. A common example is primary memory -- the contents of a memory unit may be saved to disk and later restored to RAM. Possible deadlocks involving pre-emptable resources can generally be handled by taking some resources from one process and temporarily reallocating them to another process that needs them.
Deadlocks generally involve non pre-emptable resources. These can't normally be reassigned dynamically to deal with deadlocks.
Order of Acquisition
When more than one process needs more than one resource or resource type, the order of resource acquisition can lead to deadlock, as in the following example. Note the subtle change in ordering in the process_B code in listing (b), on the right.
In the potentially blocking example (b) on the right, if process A grabs resource 1 and process B grabs resource 2 at the same time (and only one instance of each exists), the processes will form a deadlocked set, each waiting on a resource held by the other.
Four conditions are necessary for deadlock:
Mutual Exclusion: Only one process may use a resource at a time.
Hold and Wait: A process may hold allocated resources while waiting for other resources.
No Pre-emption: No resource can be forcibly removed from a process.
Circular Wait: A closed chain of process and resources exists.
If any of these four conditions is absent, deadlock will not occur. The last condition, Circular Wait, describes when a deadlock actually exists.
Resource Allocation Graphs
In this system for modeling resource allocation, a process is represented by a circle and a resource is represented by a square.
A process holding a resource is indicated by a directed edge from the resource to the process.
A process requesting a resources is indicated by a directed graph from the process to the resource.
This version of the resource graph applies when there is only a single unit of each resource.
The three basic approaches for handling deadlock are:
Detection and Recovery. Allow deadlock to occur and then break it through a recovery procedure.
Avoidance. The OS allocates resources so that deadlock is guaranteed to never occur in the future.
Prevention. Prevent deadlock based on "attacking" one (or more) of the four necessary conditions.
We will talk about each of these.
In the deadlock detection strategy, the OS does not do anything to actively avoid or prevent deadlock. Instead, it allows deadlocks to occur, tries to detect them when they do, and then takes some action to try to recover.
One Resource of Each Type
If the OS tracks resource allocations and resource requests in a Resource Allocation Graph, described above, the result is a directed graph. The problem of deadlock detection is then the same as the problem of cycle detection in a directed graph. If the graph has V vertices and E edges, there exist algorithms for cycle detection that run in O(V+E) time.
Multiple Resources of Each Type
The algorithm for deadlock detection with multiple resources of each type, like other algorithms related to deadlock, uses several data structures:
Existing Resources vector: E
Available Resources vector: A
Current Allocation matrix: C
Request matrix: R
All resources must either be allocated or available, so that:
where processes or threads are listed by row number ( i ) and resources are listed by column number ( j ). Each process is initially "unmarked". When the algorithm completes, if all processes are marked, no deadlock exists. If some processes remain unmarked at the end of the algorithm, a deadlock exists and the unmarked processes form the deadlock set. The steps of the algorithm are:
Look for an unmarked process Pi, for which the ith row of R is less than or equal to A (for all columns).
If such a process is found, add the ith row of C to A, mark the process, and go back to step 1.
If no such process exists, the algorithm terminates.
Step 1 is analogous to looking for a process that can run to completion. Step 2 frees that process' resources and makes them available for others. The following images illustrate an example.
Initial matrices for processes P1, P2, P3, and P4.
P3 runs to completion. A unit of R4 is freed from C and added to A.
P4 runs to completion. P4 was holding nothing, so nothing is added to A.
There are not enough resources in A for either P1 or P2 to run to completion. P1 and P2 are deadlocked over resources R2 and R3.
Note that the algoritm does not require all possible execution sequences to be successful. It is sufficient that there exists at least one possible sequence of all processes executing to completion, for a deadlock to not currently exist.
There are three basic approaches to deadlock recovery: Pre-emption, Rollback, and Killing Processes.
In this method, resources are temporarily taken away from some processes and allocated to others, so the latter may run to completion or at least escape a deadlock. As noted earlier, though, many resource type cannot practically be pre-empted, making this strategy very difficult or even impossible in a modern OS.
The second strategy, rollback, requires that processes periodically be checkpointed. Checkpointing means writing the entirety of the process state (memory contents, resources, registers PCB, etc. -- the entire "process image") to a file, so that it could be restarted later, from this point. The notion is similar to that of a "snapshot" of a virtual machine or file system. When a deadlock is detected, some processes that currently hold needed resources are reset back to a checkpointed state, so those resources may be reallocated to others. Obviously, this destroys any work done by the process since the checkpoint operation, which is itself costly.
This is the crudest but simplest method of recovering the OS from a deadlock. The OS may choose a process involved in the deadlock or a process not involved in the deadlock, but ideally it will kill the fewest number of processes necessary to free resources and eliminate the deadlock condition. Ideally, the OS would try to kill a process that could be restarted with little or no adverse affect on other operations.
The deadlock detection algorithm examines processes and threads based on the resources they are currently holding, and currently requesting (blocked on). But if deadlock exists, it's too late to easily and practically fix.
What if, instead, we could ask processes and threads to identify the shared resources they will need to use in the future, and evaluate resource requests as they occur? It is more flexible for OS to have the ability to determine dynamically, during execution, whether it should grant a particular resource request based on whether deadlock could occur, in the future, as a result, instead of simply waiting for deadlock to occur and detecting it.
The Banker's Algorithm attempts to solve exactly this problem. It works in a very similar fashion to the deadlock detection algorithm that we discussed above, but instead of a given "Needs" matrix it computes the Needs matrix based on projected future requests. We'll skip the details, because the basic idea is so similar and also because of this key limitation: in real applications, processes are not able to predict their resource needs in advance. In addition, both processes and resources can come and go. Nonetheless, the idea of asking for information about future requests, and proactively avoiding problems, can be useful in some situations.
The strategy of deadlock prevention involves removing one of the four necessary conditions. However, there are challenges associated with each.
Mutual Exclusion. If mutual exclusion were removed from the system, unfortunately, access to shared resources could be simultaneous, violating correctness. Race conditions and other hazards make this approach impossible.
Hold and Wait. If we prevent processes from simultaneously holding one resource while waiting for another resource, we would eliminate deadlock. A process would not proceed until everything it needs is available. As mentioned, though, a process often cannot know in advance what resources it will need in the future, so this approach won't work, either.
No Pre-emption. If we permit the OS to pre-empt resources when it needs to, deadlock can be prevented. However, many resources are "non pre-emptable," so this won't work.
Circular Wait. This condition can be targeted in a number of ways. We could limit processes to holding only one resources at a time, but that would be impractical. Another approach assigns a each resource a global number, and requires processes needing multiple resources to acquire them in numerical order. If follwed, this approach does prevent deadlock. However, it may not be possible to find an ordering that satisfies all processes, so this approach fails as well.
In summary, no existing methods for deadlock prevention have yet been shown to be practical in a realistic operating system.
In some specific situations, two-phase locking can be used to avoid deadlock. In the first phase, a process tries to lock all the records it needs. If successful, it begins phase 2, which involves making updates and releasing the records. If the process, during the first phase, finds a record already locked, it drops all its locks and starts over.
This approach works fine in databases, but does not generalize well to all resource types. In many situations, a process dropping all resource locks and restarting its sequence may not be acceptable.
A communication deadlock occurs when two are more processes are both blocked while waiting to receive messages, rather than waiting for resources. For example, if process A sends a message to process B and blocks while waiting for a reply, and the message gets lost, then process B may also block while waiting for the message to be acknowledged or for a response. There are no shared resources, but processes A and B are deadlocked, and neither may proceed.
Communication deadlocks can be addressed by using non-blocking send() and receive() operations, or by including retransmission timeouts, so the period of blocking is not indefinite.
Livelock is different from deadlock, but related in a sense. Livelock occurs when two or more processes continuously change their states in response to changes in the other process(es), without doing any useful work. The Tanenbaum text offers the following example, of two processes that are each "too polite," due to their willingness to drop resources for each other. (note: in the text figure 6-16, both processes erroneously have the same name).
If both processes proceed in parallel (say, on different CPUs) in exact lockstep, neither will escape the while loop.
Summary / Key Points
Deadlock is the permanent blocking of a set of processes (or threads) that either compete for system resources or communicate with each other.
There are four conditions for deadlock: Mutual Exclusion, Hold and Wait, No Pre-emption, and Circular Wait. Circular wait indicates an actual deadlock.
The deadlock detection strategy, with single instances of each resource, involves finding a cycle in the Resource Allocation Graph, which is a DAG. The algorithm runs in time linear to the size of the graph: O(V+E).
The deadlock detection algorithm, with multiple instances of each resource, involves creating vectors of Available and Existing resources, as well as matrices expressing process' Current and Requested resource allocations. The algorithm looks for a notional sequence in which processes that could run to execution.
There are three basic strategies for deadlock recovery: resource pre-emption, checkpointing and rollback, or just killing processes.
Deadlock avoidance is implemented by making informed choices about resource allocation.
The Banker's Algorithm, for example, seeks to make resource allocation decisions that avoid "unsafe"states. Although helpful in understanding deadlock, it is not easily applied in real, complex systems.
Deadlock prevention seeks to eliminate one of the four necessary conditions: Mutual Exclusion, Hold and Wait, No Pre-emption, or Circular Wait. Modern operating system complexity makes all these approaches impractical to implement.
A special technique called two-phase locking can prevent deadlocks, but its implementation is limited to database records and similar simple resources.
Communication deadlocks are similar to resource deadlocks, but they do not involve shared resources. Instead, they involve more than one process blocking while awaiting communication.
Livelock occurs when two or more processes continuously change their states in response to changes in the other process(es), without doing any useful work.