Deadlock

Deadlock

Reading

Intro

We have discussed how processes and threads can share access to resources, using concurrency primitives like semaphores and monitors. The examples we discussed were relatively simple, compared to real-world applications. It turns out that realistically complex applications often need to use multiple resources at once to accomplish certain tasks. Doing so in a multiprocessing or multithreaded environment often leads to contention for multiple resource instances or resource types. When this occurs, a common result is deadlock. Users may experience this as the OS reporting that an application is unresponsive and must be closed, in the best case, or in the worst case the entire OS becomes unresponsive or crashes.

In this section, we discuss how deadlocks can occur, and explore methods for detecting and recovering from them. There currently does not exist an efficient, general solution for the deadlock problem in operating systems, but there are a number of techniques for at least detecting and recovering from a deadlock with reasonable efficiency.

Deadlock: the permanent blocking of a set of processes (or threads) that either compete for system resources or communicate with each other. Each process (or thread) in a deadlocked set is waiting on an event or resource that can only be triggered or freed by another member of the set.

Resources

Resources may include I/O devices, data records, files, named pipes, and really anything else that can be owned by a process. As with our IPC discussion, all the concepts outlined here apply equally to processes and threads, since both may deal with shared resources. Some resources may have several similar instances available (for example, two ethernet devices). In short, a resource is "anything that must be acquired, used, and released over the the course of time." (Tanenbuam)

Resource Types

There are two basic types or resource, pre-emptable and non pre-emptable. The type of a resource may depend, in part, on context.

A pre-emptable resource can be taken away from the process owning it (and later returned to it) with no ill effects. A common example is primary memory -- the contents of a memory unit may be saved to disk and later restored to RAM. Possible deadlocks involving pre-emptable resources can generally be handled by taking some resources from one process and temporarily reallocating them to another process that needs them.

Deadlocks generally involve non pre-emptable resources. These can't normally be reassigned dynamically to deal with deadlocks.

Order of Acquisition

When more than one process needs more than one resource or resource type, the order of resource acquisition can lead to deadlock, as in the following example. Note the subtle change in ordering in the process_B code in listing (b), on the right.

semaphore resource1;
semaphore resource2;

void process_A(void) {
    semWait(resource1);
    semWait(resource2);
    use_both_resources();
    semSignal(resource2);
    semSignal(resource1);
}

void process_B(void) {
    semWait(resource1);
    semWait(resource2);
    use_both_resources();
    semSignal(resource2);
    semSignal(resource1);
}
(a) Deadlock-free code
semaphore resource1;
semaphore resource2;

void process_A(void) {
    semWait(resource1);
    semWait(resource2);
    use_both_resources();
    semSignal(resource2);
    semSignal(resource1);
}

void process_B(void) {
    semWait(resource2);  /* Note order */
    semWait(resource1);
    use_both_resources();
    semSignal(resource2);
    semSignal(resource1);
}
(b) Code with potential deadlock
In the potentially blocking example (b) on the right, if process A grabs resource 1 and process B grabs resource 2 at the same time (and only one instance of each exists), the processes will form a deadlocked set, each waiting on a resource held by the other.

Deadlocks

Necessary Conditions

Four conditions are necessary for deadlock: If any of these four conditions is absent, deadlock will not occur. The last condition, Circular Wait, describes when a deadlock actually exists.

Resource Allocation Graphs

In this system for modeling resource allocation, a process is represented by a circle and a resource is represented by a square. A process holding a resource is indicated by a directed edge from the resource to the process. A process requesting a resources is indicated by a directed graph from the process to the resource. This version of the resource graph applies when there is only a single unit of each resource.

Resource allocation graphs:
(a) Process A holds resource R.
(b) Process B requests resource S.
(c) Processes C and D deadlock over resources T and U.
Image: Tanenbaum.

Handling Deadlock

The three basic approaches for handling deadlock are: We will talk about each of these.

Detection

In the deadlock detection strategy, the OS does not do anything to actively avoid or prevent deadlock. Instead, it allows deadlocks to occur, tries to detect them when they do, and then takes some action to try to recover.

One Resource of Each Type

If the OS tracks resource allocations and resource requests in a Resource Allocation Graph, described above, the result is a directed graph. The problem of deadlock detection is then the same as the problem of cycle detection in a directed graph. If the graph has V vertices and E edges, there exist algorithms for cycle detection that run in O(V+E) time.


(a) A Resource Allocation Graph.
(b) A cycle extracted from (a).

Multiple Resources of Each Type

The algorithm for deadlock detection with multiple resources of each type, like other algorithms related to deadlock, uses several data structures: All resources must either be allocated or available, so that: $$\sum_{i=1}^{n}C_{ij}+A_j=E_j$$ where processes or threads are listed by row number ( i ) and resources are listed by column number ( j ). Each process is initially "unmarked". When the algorithm completes, if all processes are marked, no deadlock exists. If some processes remain unmarked at the end of the algorithm, a deadlock exists and the unmarked processes form the deadlock set. The steps of the algorithm are:
  1. Look for an unmarked process Pi, for which the ith row of R is less than or equal to A (for all columns).
  2. If such a process is found, add the ith row of C to A, mark the process, and go back to step 1.
  3. If no such process exists, the algorithm terminates.
Step 1 is analogous to looking for a process that can run to completion. Step 2 frees that process' resources and makes them available for others. The following images illustrate an example.

Initial matrices for processes P1, P2, P3, and P4.


P3 runs to completion. A unit of R4 is freed from C and added to A.


P4 runs to completion. P4 was holding nothing, so nothing is added to A.


There are not enough resources in A for either P1 or P2 to run to completion.
P1 and P2 are deadlocked over resources R2 and R3.

Note that the algoritm does not require all possible execution sequences to be successful. It is sufficient that there exists at least one possible sequence of all processes executing to completion, for a deadlock to not currently exist.

Recovery

There are three basic approaches to deadlock recovery: Pre-emption, Rollback, and Killing Processes.

Pre-emption

In this method, resources are temporarily taken away from some processes and allocated to others, so the latter may run to completion or at least escape a deadlock. As noted earlier, though, many resource type cannot practically be pre-empted, making this strategy very difficult or even impossible in a modern OS.

Rollback

The second strategy, rollback, requires that processes periodically be checkpointed. Checkpointing means writing the entirety of the process state (memory contents, resources, registers PCB, etc. -- the entire "process image") to a file, so that it could be restarted later, from this point. The notion is similar to that of a "snapshot" of a virtual machine or file system. When a deadlock is detected, some processes that currently hold needed resources are reset back to a checkpointed state, so those resources may be reallocated to others. Obviously, this destroys any work done by the process since the checkpoint operation, which is itself costly.

Killing Processes

This is the crudest but simplest method of recovering the OS from a deadlock. The OS may choose a process involved in the deadlock or a process not involved in the deadlock, but ideally it will kill the fewest number of processes necessary to free resources and eliminate the deadlock condition. Ideally, the OS would try to kill a process that could be restarted with little or no adverse affect on other operations.

Deadlock Avoidance

The deadlock detection algorithm examines processes and threads based on the resources they are currently holding, and currently requesting (blocked on). But if deadlock exists, it's too late to easily and practically fix.

What if, instead, we could ask processes and threads to identify the shared resources they will need to use in the future, and evaluate resource requests as they occur? It is more flexible for OS to have the ability to determine dynamically, during execution, whether it should grant a particular resource request based on whether deadlock could occur, in the future, as a result, instead of simply waiting for deadlock to occur and detecting it.

The Banker's Algorithm attempts to solve exactly this problem. It works in a very similar fashion to the deadlock detection algorithm that we discussed above, but instead of a given "Needs" matrix it computes the Needs matrix based on projected future requests. We'll skip the details, because the basic idea is so similar and also because of this key limitation: in real applications, processes are not able to predict their resource needs in advance. In addition, both processes and resources can come and go. Nonetheless, the idea of asking for information about future requests, and proactively avoiding problems, can be useful in some situations.

Deadlock Prevention

The strategy of deadlock prevention involves removing one of the four necessary conditions. However, there are challenges associated with each. In summary, no existing methods for deadlock prevention have yet been shown to be practical in a realistic operating system.

Other Issues

Two-phase Locking

In some specific situations, two-phase locking can be used to avoid deadlock. In the first phase, a process tries to lock all the records it needs. If successful, it begins phase 2, which involves making updates and releasing the records. If the process, during the first phase, finds a record already locked, it drops all its locks and starts over.

This approach works fine in databases, but does not generalize well to all resource types. In many situations, a process dropping all resource locks and restarting its sequence may not be acceptable.

Communication Deadlocks

A communication deadlock occurs when two are more processes are both blocked while waiting to receive messages, rather than waiting for resources. For example, if process A sends a message to process B and blocks while waiting for a reply, and the message gets lost, then process B may also block while waiting for the message to be acknowledged or for a response. There are no shared resources, but processes A and B are deadlocked, and neither may proceed.

Communication deadlocks can be addressed by using non-blocking send() and receive() operations, or by including retransmission timeouts, so the period of blocking is not indefinite.

Livelock

Livelock is different from deadlock, but related in a sense. Livelock occurs when two or more processes continuously change their states in response to changes in the other process(es), without doing any useful work. The Tanenbaum text offers the following example, of two processes that are each "too polite," due to their willingness to drop resources for each other. (note: in the text figure 6-16, both processes erroneously have the same name).

void Process_A (void) {
  acquire_lock(resource_1);
  while (!try_lock(resource_2)) {
    release_lock(resource_1);
    wait_fixed_time();
    acquire_lock(resource_1);
  }
  use_both_resources();
  release_lock(resource_2);
  release_lock(resource_1);
}
void Process_B (void) {
  acquire_lock(resource_2);
  while (!try_lock(resource_1)) {
    release_lock(resource_2);
    wait_fixed_time();
    acquire_lock(resource_2);
  }
  use_both_resources();
  release_lock(resource_1);
  release_lock(resource_2);
}
If both processes proceed in parallel (say, on different CPUs) in exact lockstep, neither will escape the while loop.

Summary / Key Points