Scheduling in general is deciding the order in which a given resource is allocated to processes when more than one wants the resource. Here, the resource we consider is the CPU, and scheduling means deciding the order in which processes are selected to run on the CPU when more than one process in the system is ready to run. This is decision is made by the scheduler, an important component of the OS.

This is distinct from the dispatcher, which performs the actual mechanics of carrying out the process switch. The choice of scheduling algorithm is an important driver in operating system performance. Many of the same issues apply equally to process scheduling and thread scheduling, but there are some differences, discussed later.


Process Behavior

Most processes will alternate between bursts of computing and waiting on I/O. However, some processes tend to spend more time computing, so we call them CPU-bound processes. Other processes tend to spend more time waiting for I/O, so we call them I/O-bound processes. This is a subjective characterization; there is no formal definition describing the different process types. Furthermore, a process may alternate between the two types of behavior. Characterizing processes as CPU-bound or I/O-bound helps us reason about what type of scheduling algorithm is most appropriate and why.
(a) a CPU-bound process. (b) An I/O-bound process. (Tanenbaum)

When to Schedule

There are a number of situations that require a scheduling decision to be made:
Scheduling/dispatching can be done in one of two ways:

Categories of Scheduling Algorithms

There are three different regimes in which scheduling algorithms generally operate:

Scheduler Goals or Metrics

Operating system schedulers usually have a variety of goals. Some of the most common ones are: Different metrics are important for different types of systems, as show in this figure from the book.
Scheduler goals. (Tannenbaum)
In discussing the various types of scheduling algorithms, it is important to bear in mind how each performs differently, in terms of the above criteria.

Scheduling Algorithms

The following sections describe a number of common scheduling algorithms and algorithm types. It is by no means exhaustive. The creation of new scheduling algorithms is limited only by the imagination, and each one has its pros and cons.

First Come, First Served (FCFS)

This is the simplest of scheduling algorithms. It simply maintains a FIFO queue of ready processes. It is not pre-emptive. The principal drawback is that, in a mix of CPU-bound and I/O-bound processes, the CPU-bound processes are greatly favored and the I/O-bound processes are penalized (every time they block, which is often, they go to the back of the line).

Shortest Job First (SJF)

This algorithm assumes that process run times are known (or can be estimated) in advance. The scheduler maintains a queue of ready jobs, placing at the front of the queue those jobs with the shortest expected running times to completion.

(a) Running four jobs in their original order -- 8,4,4,4.
(b) Running them in Shortest Job First (SJF) order -- 4,4,4,8. (Tanenbaum)
Shortest Job First obviously favors shorter-run jobs, and therefore is good for Throughput and Response Time.

Shortest Remaining Time (SRT)

SRT is the pre-emptive version of SJF. Again, run time must be known (or estimated) in advance. For each job, its remaining run time is tracked continuously, decremented each time it runs on the CPU. The scheduler chooses the process whose remaining run time is the shortest among all those in the queue. When a new job arrives, its total required execution time is compared to the currently running process' remaining required execution time. If it's shorter, the current process is pre-empted and the new process is started.

SRT also favors shorter-run jobs, so it should yield good throughput and response time, but at the expense of fairness to longer jobs.

Round Robin

Round Robin is a pre-emptive scheduling algorithm that employs a time quantum and a simple FIFO queue.

An important performance consideration is setting the time quantum properly. If it is set too short, there will be too many process switches, reducing CPU efficiency. If the quantum is set too long, response time to interactive requests may be poor.

If a job does not use its entire time quantum before completion or blocking, the quantum resets as a new job is scheduled (e.g., the new job does not use up the remainder of the previous job's time quantum, it gets a fresh one).


In Priority scheduling, each process is assigned a priority level, and the highest priority process always runs. The priority may be set statically (never changes), or it may be modified dynamically. For example, to keep a process from running indefinitely, its priority could be gradually decreased over time, while it's running. Priority scheduling my be preemptive or non-preemptive.

Multiple Priority Queues

Priority scheduling can be combined with other systems. For example, if there are multiple discrete priority levels, each priority level could have its own queue and scheduling algorithm, such as Round Robin, within that queue.

Four priority queues. (Tanenbaum)
Niceness. Linux assignes a priority level to each process, but also adds a complementary (though not always used) metric called "niceness". The higher a process' niceness score, the more willing the process is to yield the CPU. Important OS processes actually have a negative niceness score. Check it out at the command shell as follows (output abbreviated):
$ ps -e -o pid,pri,ni,cmd | more
    1  19   0 /sbin/init
    2  19   0 [kthreadd]
    3  19   0 [ksoftirqd/0]
    5  39 -20 [kworker/0:0H]
    7  19   0 [rcu_sched]
   25 139   - [migration/0]
   26 139   - [watchdog/0]
   28 139   - [migration/1]
   29  19   0 [ksoftirqd/1]
   31  39 -20 [kworker/1:0H]
   32 139   - [watchdog/2]
   33 139   - [migration/2]
   34  19   0 [ksoftirqd/2]
   36  39 -20 [kworker/2:0H]

Shortest Process Next (SPN)

Interactive processes generally follow the pattern of: wait for a command, then execute the command, and repeat. Although there may be no advance knowledge of how long the process will run on the CPU next, it may be possible to keep a running estimate, based on past performance.

The technique begins with an initial estimate for the first run, then using a weighted, or "aged," average of the process' most recent running times on the processor. Suppose the initial estimated time per command for a process is T0, and the time of the next run is actually measured as T1. We can combine these using a weighting factor, α (0 < α < 1), and the general recurrence relation: $$S_{n+1}=\alpha T_{n}+(1-\alpha)S_n$$ where Sn+1 is the estimated runtime for cycle n+1, Tn is the actual observed running time during cycle n, and Sn was the estimated running time for cycle n.

Choosing α close to 1.0 ages out the observations quickly, giving the most weight to recent observations; choosing a small α, say 0.2, ages them out more slowly, giving older time observations more weight. The aging technique is an application of exponential smoothing.

SPN is not pre-emptive, but it eliminates the bias in favor of longer processes that's inherent in FCFS, and improves response time.

Lottery Scheduling

Lottery scheduling is a method for providing relative portions of the system's resources to processes, without needing to do so in a deterministic, guaranteed way. Each process has one or more lottery tickets. The next process to run is selected from the ready queue by lottery ticket. Lottery tickets are apportioned based on relative priority. For example, if one process should be doing about 20% of the work on the system, it holds about 20% of the available lottery tickets. New processes are issued lottery tickets on creation. Cooperative processes may even give lottery tickets to each other in real time.

Lottery scheduling is a simple method for apportioning resources. It avoids complex calculations, while enforcing a notion of fairness. However, it would not be appropriate for systems where deterministic guarantees are required, such as real-time systems.

Fair-Share Scheduling

Fair Share scheduling extends the idea of scheduling identities from processes (or threads) to users (or applications). If all scheduling is based on processes, an application or user can increase his share of overall time simply by creating a lot of processes. To avoid this, processes are grouped, for example by user, and processor time will be counted on that basis, instead of by process.

Feedback Scheduling

Another scheduling algorithm, not included in the Tanenbaum text, is Feedback scheduling. In real systems, there is often no available info about how long a process will run during its next time on the CPU. Therefore, algorithms like SRT can't be implemented directly. Feedback scheduling is pre-emptive, based on dynamic priority. It maintains multiple priority ready queues, RQ0 (highest priority) through RQn (lowest priority). A process is initially placed in RQ0. After its first pre-emption from the CPU, it goes to RQ1. Every subsequent pre-emption, it moves to the next lower-priority queue. A short process will complete quickly, whereas a longer process will drift down through the priority queues. Hence, newer, shorter processes are favored over older, longer processes. Each individual priority queue may be treated as Round Robin or some other variation.

Feedback scheduling (Stallings).
Feedback scheduling has good response time and average throughput, but favors shorter jobs and could lead to starvation of longer jobs.

Thread Scheduling

There are some scheduling differences for threads, depending on whether user-level threads (ULTs) or kernel-level threads (KLTs) are employed.

(a) Scheduling ULTs. (b) Scheduling KLTs. (Tanenbaum)

Affinity. Scheduling a process or thread on the same CPU where it ran recently, instead of on a different CPU, is the idea of affinity. It can be hard (always assigned to the same CPU) or soft (usually assigned to the same CPU). Affinity is normally set by the OS, but some systems allow the user to manually select process or thread affinity for user programs.

The key performance benefit has to do with the cache. Recall that modern CPUs have a memory cache that is specific to that CPU; some have two levels, called L1 and L2. If some memory references contained in the cache lines are still present from a prior run of a process or thread on that CPU, the cache hit rate increases, reducing overall memory access time. This is especially true if the process or thread ran on the CPU not long ago. The effect may be observed when threads belonging to the same process are scheduled frequently on the same CPU, because the threads are accessing many of the same shared regions in memory.

Summary / Key Points