Scheduling in general is deciding the order in which a given resource is allocated to processes when more than one wants the resource. Here, the resource we consider is the CPU, and scheduling means deciding the order in which processes are selected to run on the CPU when more than one process in the system is ready to run. This is decision is made by the scheduler, an important component of the OS.
This is distinct from the dispatcher, which performs the actual mechanics of carrying out the process switch. The choice of scheduling algorithm is an important driver in operating system performance. Many of the same issues apply equally to process scheduling and thread scheduling, but there are some differences, discussed later.
Most processes will alternate between bursts of computing and waiting on I/O. However, some processes tend to spend more time computing, so we call them CPU-bound processes. Other processes tend to spend more time waiting for I/O, so we call them I/O-bound processes. This is a subjective characterization; there is no formal definition describing the different process types. Furthermore, a process may alternate between the two types of behavior. Characterizing processes as CPU-bound or I/O-bound helps us reason about what type of scheduling algorithm is most appropriate and why.
(a) a CPU-bound process. (b) An I/O-bound process. (Tanenbaum)
When to Schedule
There are a number of situations that require a scheduling decision to be made:
New process creation. When a parent process forks, should the child process run, or should the parent process continue first?
Process exit. When a process terminates, some other process must be chosen.
Process block. When a process blocks for I/O, a semaphore, or some other reason, another process must be selected.
Process yield. A process may voluntarily yield the CPU, in which case it goes to the "Ready" state, but another process can be selected to run.
I/O Interrupt. When an I/O device has completed a data transfer, it usually lets the OS know by sending an interrupt (this includes processors with DMA). If the OS knows which processes are blocked awaiting this I/O, it could schedule those processes next, continue with whatever process was running before the interrupt, or choose some entirely different process.
Hardware Clock Timeout. Modern CPUs allow the OS to set a timer for the maximum interval a process is allowed to run on the CPU. This time interval is sometimes called a 'quantum'. If a process executes without blocking or otherwise being interrupted, the scheduler will choose a new process to run at the end of current time quantum.
Scheduling/dispatching can be done in one of two ways:
Preemptive: the CPU/OS can kick off the running process at any time, such as for a timeout or a higher-priority process.
Non-preemptive: the CPU/OS allows the running process to continue until it yields or blocks.
Categories of Scheduling Algorithms
There are three different regimes in which scheduling algorithms generally operate:
Batch Systems. These are not interactive, so response time to a user is not a factor. Therefore, non-preemptive algorithms or pre-emptive algorithms with very long timeouts are usually acceptable.
Interactive Systems. This type is most familiar to us -- desktops, laptops, tablets, mobile phones, and even servers. These must generally be pre-emptive in order to meet users' response time expectations.
Real Time Systems. Real-time systems are generally not concerned with user interaction, and are often embedded systems or controllers for physical systems. Their primary concern is meeting specified real-time objectives, often repetitively. Schedulers for real-time systems, covered at the end of this section, are often pre-emptive and customized.
Scheduler Goals or Metrics
Operating system schedulers usually have a variety of goals. Some of the most common ones are:
Fairness: general equity among processes.
Throughput: the number of jobs completed per unit of time.
Turnaround Time: time from job submission to completion.
Response Time: time from request submission to when processing begins (or, for interactive systems, time to respond to the user).
Utilization: this involves keeping all part of the system busy, when possible.
Different metrics are important for different types of systems, as show in this figure from the book.
In discussing the various types of scheduling algorithms, it is important to bear in mind how each performs differently, in terms of the above criteria.
The following sections describe a number of common scheduling algorithms and algorithm types. It is by no means exhaustive. The creation of new scheduling algorithms is limited only by the imagination, and each one has its pros and cons.
First Come, First Served (FCFS)
This is the simplest of scheduling algorithms. It simply maintains a FIFO queue of ready processes. It is not pre-emptive. The principal drawback is that, in a mix of CPU-bound and I/O-bound processes, the CPU-bound processes are greatly favored and the I/O-bound processes are penalized (every time they block, which is often, they go to the back of the line).
Shortest Job First (SJF)
This algorithm assumes that process run times are known (or can be estimated) in advance. The scheduler maintains a queue of ready jobs, placing at the front of the queue those jobs with the shortest expected running times to completion.
(a) Running four jobs in their original order -- 8,4,4,4. (b) Running them in Shortest Job First (SJF) order -- 4,4,4,8. (Tanenbaum)
Shortest Job First obviously favors shorter-run jobs, and therefore is good for Throughput and Response Time.
Shortest Remaining Time (SRT)
SRT is the pre-emptive version of SJF. Again, run time must be known (or estimated) in advance. For each job, its remaining run time is tracked continuously, decremented each time it runs on the CPU. The scheduler chooses the process whose remaining run time is the shortest among all those in the queue. When a new job arrives, its total required execution time is compared to the currently running process' remaining required execution time. If it's shorter, the current process is pre-empted and the new process is started.
SRT also favors shorter-run jobs, so it should yield good throughput and response time, but at the expense of fairness to longer jobs.
Round Robin is a pre-emptive scheduling algorithm that employs a time quantum and a simple FIFO queue.
An important performance consideration is setting the time quantum properly. If it is set too short, there will be too many process switches, reducing CPU efficiency. If the quantum is set too long, response time to interactive requests may be poor.
If a job does not use its entire time quantum before completion or blocking, the quantum resets as a new job is scheduled (e.g., the new job does not use up the remainder of the previous job's time quantum, it gets a fresh one).
In Priority scheduling, each process is assigned a priority level, and the highest priority process always runs. The priority may be set statically (never changes), or it may be modified dynamically. For example, to keep a process from running indefinitely, its priority could be gradually decreased over time, while it's running. Priority scheduling my be preemptive or non-preemptive.
Multiple Priority Queues
Priority scheduling can be combined with other systems. For example, if there are multiple discrete priority levels, each priority level could have its own queue and scheduling algorithm, such as Round Robin, within that queue.
Four priority queues. (Tanenbaum)
Niceness. Linux assignes a priority level to each process, but also adds a complementary (though not always used) metric called "niceness". The higher a process' niceness score, the more willing the process is to yield the CPU. Important OS processes actually have a negative niceness score. Check it out at the command shell as follows (output abbreviated):
Interactive processes generally follow the pattern of: wait for a command, then execute the command, and repeat. Although there may be no advance knowledge of how long the process will run on the CPU next, it may be possible to keep a running estimate, based on past performance.
The technique begins with an initial estimate for the first run, then using a weighted, or "aged," average of the process' most recent running times on the processor. Suppose the initial estimated time per command for a process is T0, and the time of the next run is actually measured as T1. We can combine these using a weighting factor, α (0 < α < 1), and the general recurrence relation:
where Sn+1 is the estimated runtime for cycle n+1, Tn is the actual observed running time during cycle n, and Sn was the estimated running time for cycle n.
Choosing α close to 1.0 ages out the observations quickly, giving the most weight to recent observations; choosing a small α, say 0.2, ages them out more slowly, giving older time observations more weight. The aging technique is an application of exponential smoothing.
SPN is not pre-emptive, but it eliminates the bias in favor of longer processes that's inherent in FCFS, and improves response time.
Lottery scheduling is a method for providing relative portions of the system's resources to processes, without needing to do so in a deterministic, guaranteed way. Each process has one or more lottery tickets. The next process to run is selected from the ready queue by lottery ticket. Lottery tickets are apportioned based on relative priority. For example, if one process should be doing about 20% of the work on the system, it holds about 20% of the available lottery tickets. New processes are issued lottery tickets on creation. Cooperative processes may even give lottery tickets to each other in real time.
Lottery scheduling is a simple method for apportioning resources. It avoids complex calculations, while enforcing a notion of fairness. However, it would not be appropriate for systems where deterministic guarantees are required, such as real-time systems.
Fair Share scheduling extends the idea of scheduling identities from processes (or threads) to users (or applications). If all scheduling is based on processes, an application or user can increase his share of overall time simply by creating a lot of processes. To avoid this, processes are grouped, for example by user, and processor time will be counted on that basis, instead of by process.
Another scheduling algorithm, not included in the Tanenbaum text, is Feedback scheduling. In real systems, there is often no available info about how long a process will run during its next time on the CPU. Therefore, algorithms like SRT can't be implemented directly. Feedback scheduling is pre-emptive, based on dynamic priority. It maintains multiple priority ready queues, RQ0 (highest priority) through RQn (lowest priority). A process is initially placed in RQ0. After its first pre-emption from the CPU, it goes to RQ1. Every subsequent pre-emption, it moves to the next lower-priority queue. A short process will complete quickly, whereas a longer process will drift down through the priority queues. Hence, newer, shorter processes are favored over older, longer processes. Each individual priority queue may be treated as Round Robin or some other variation.
Feedback scheduling (Stallings).
Feedback scheduling has good response time and average throughput, but favors shorter jobs and could lead to starvation of longer jobs.
There are some scheduling differences for threads, depending on whether user-level threads (ULTs) or kernel-level threads (KLTs) are employed.
Pure ULTs: The kernel schedules processes only, as it is not aware of threads. The user-mode runtime system must perform thread-specific scheduling. It can use round-robin, priority, or some other algorithm that's application specific. There is no hardware clock available to aid with scheduling, so the threads must cooperate by yielding at appropriate times.
Pure KLTs (or hybrid): The kernel is aware of threads, and so it chooses threads to schedule. When a thread times out, the kernel can choose to schedule another thread from the same process, or another thread from another process.
Affinity. Scheduling a process or thread on the same CPU where it ran recently, instead of on a different CPU, is the idea of affinity. It can be hard (always assigned to the same CPU) or soft (usually assigned to the same CPU). Affinity is normally set by the OS, but some systems allow the user to manually select process or thread affinity for user programs.
The key performance benefit has to do with the cache. Recall that modern CPUs have a memory cache that is specific to that CPU; some have two levels, called L1 and L2. If some memory references contained in the cache lines are still present from a prior run of a process or thread on that CPU, the cache hit rate increases, reducing overall memory access time. This is especially true if the process or thread ran on the CPU not long ago. The effect may be observed when threads belonging to the same process are scheduled frequently on the same CPU, because the threads are accessing many of the same shared regions in memory.
Summary / Key Points
Processes can be informally characterized as I/O-bound or CPU-bound.
Schedulers can be preemptive or non-preemptive.
Scheduling algorithms may be tailored toward batch systems, interactive systems, or real-time systems.
The important metrics by which scheduling algorithms are judged include fairness, throughput, turnaround time, response time, and utilization, among others.
Thread scheduling decisions differ, depending on whether the OS implements ULTs or KLTs. There can be a performance improvement from using thread CPU affinity, due to cache warming.