In a virtual memory system, page faults occur frequently and are a normal part of operation. When they do occur, if memory is full (as it usually is), the OS needs to decide which page will get kicked out of real memory and written to disk, to make room for new incoming memory pages as they're read in when a page fault occurs. When a page is swapped out, it only needs to be written to disk if it has been modified, of course. When the OS chooses among memory pages to evict from RAM, it can choose from among all memory pages (a global strategy), or just choose from among the memory pages belonging to the process incurring the page fault (a local strategy). There are a variety of algorithm types for selecting the page to be replaced, each with their own strengths and weaknesses.
When the operating system is constantly swapping pages to and from secondary storage, the condition is called thrashing. It means there are so many processes running that the demand for different pages in memory is very high. If a system is thrashing, page faults occur very frequently.
If a system is thrashing, some lower-priority processes can temporarily be moved entirely to disk, and all their memory pages freed. When a process' pages are swapped out to disk and the process can't run, we say the process is suspended.
Another way to avoid thrashing is by reducing the degree of multiprogramming, which is just the count of processes that are running simultaneously. For example, the OS can choose to temporarily defer scheduled tasks until the load on the system is reduced. On the other hand, if the degree of multiprogramming is too low, the CPU stays idle often. In general, it may be ideal to have a balanced, heterogenous mix of processes running -- some that have high or low memory demands, some that have high or low CPU demands, and some that are either CPU-bound or I/O bound.
Page Replacement Algorithms
"Optimal" Page Replacement
The "optimal" strategy is to remove a page that will be referenced in the most distant future (or never). Although it's not possible to implement in reality, because predicting future computation is not tractable, the "optimal" strategy can be identified retroactively in software simulations with specific data inputs. The utility in doing this is to use it as a benchmark to compare the "real" page replacement algorithms against. The optimal strategy incurs the minimum possible number of page faults on a particular computation sequence, so "real" algorithms are good if they approach that low number of page faults, on average, most of the time, in realistic simulations.
Not Recently Used (NRU)
Bits. In NRU, the OS (with the help of hardware) keeps track of two bits for each memory page:
R (referenced): this bit is set if the page has been read or written.
M (modified): this bit is set only if the page has been written.
Both bits are initially 0 on page load. Every memory reference must update the referenced page's R and M bits, so this operation must be efficient. The R bit is periodically cleared, for example on any clock interrupt, so that only the pages that have truly been referenced recently have R set. The M bit, once set, remains set until the page is swapped out.
Operation. When there's a page fault, the OS inspects all pages and divides them into four classes based on the R and M bits:
Class 0 (00): not referenced, not modified
Class 1 (01): not referenced, modified
Class 2 (10): referenced, not modified
Class 3 (11): referenced, modified
NRU removes a page at random from the lowest-numbered nonempty class. This favors replacement of pages that do not need to be written to disk and are not in heavy use. NRU is relatively easy to implement and understand, but doesn't match the performance of more sophisticated algorithms.
FIFO keeps track of all pages in memory in a queue, with newly arriving pages at the tail, and the least recently arriving pages at the head. FIFO is very simple to implement, but performs poorly. The reason is that it does not capture frequent use of a page -- the least recently arriving page, for example, may still be in heavy use, but is replaced anyway. Because of this limitation, ordinary FIFO is not generally used for page replacement.
The FIFO algorithm can be modified slightly to capture whether a page has been referenced or not, during the most recent clock interval. Second chance uses a FIFO queue, but the OS will only evict the page at the front of the queue if its R bit is 0. If the R bit is 1, the OS clears the bit and reloads the page at the back of the queue, as if it just arrived, then examines the next page in the list for replacement. If every page has its R bit set to 1, then Second Chance operates just like pure FIFO.
Second Chance. (a) Pages sorted in FIFO order (A, at left, arrived first). (b) Page A moves to the back of the FIFO queue if it has been referenced (R=1).
Although Second Chance is simple, and performs better than FIFO, it is still somewhat inefficient.
The "Clock" algorithm arranges the memory frames in a circular buffer and simply moves a pointer around when considering pages for replacement. The clock "hand" points to the frame with the oldest page, the one currently considered for replacement. If the page being pointed to has R=0, that page is replaced, then the pointer advances to the next memory frame. If R=1, Clock sets R to 0 and advances the clock pointer to the next frame, repeating until a page is found that has R=0 that can be replaced.
The Clock algorithm. The clock pointer points to the frame whose page is being considered for replacement. If R=0, the page is replaced. If R=1, the algorithm sets R=0 and advances the pointer. This repeats until a page is found with R=0 that can be replaced. In the diagram, page C will be replaced if R=0 for that page.
Note that if all pages have R=1, the pointer will cycle through all frames, setting all the pages' R bits to 0, then end up replacing the page it was originally pointing to.
Least Recently Used (LRU)
LRU chooses for replacement the page that has not been referenced for the longest amount of time. The implementation challenge is that the list must be updated on every memory reference. To be efficient enough, the updates would need to be done using hardware, so LRU is generally not used in real systems.
Not Frequently Used (NFU) and Aging
There are two methods for simulating LRU that are a little easier to implement. They're called Not Frequently Used (NFU) and Aging, respectively.
NFU uses a counter for each page in memory. The counter represents how frequently the page is referenced. So that the counter doesn't need to be updated on each memory reference, the algorithm instead runs at each clock interrupt. At the clock interrupt, if the page's R bit is 1, the page's counter is incremented. When a page fault occurs, the page with the lowest count is replaced.
The limitation of NFU is that it treats recent memory references and older memory references with equal weight. Until a page is replaced, references to it are not "forgotten." Hence, pages may stay in memory well after they're useful, due to a high volume of older references in the counter.
A small modification to NFU, which has the effect of de-emphasizing older references, is called Aging. It differs from NFU in several ways:
The counter bits are shifted right one bit at each time interval.
The R bit is added in the leftmost (most significant) bit, instead of the rightmost (as with a normal counter).
When a page fault occurs, once again, the page with the lowest counter value is removed. With Aging, more recent memory references have precedence over less recent memory references.
The updates may occur over a preset time interval, instead of at every clock interrupt.
Aging does a better job than NFU at emphasizing recent memory references, but it does have a fixed "memory horizon," defined by the number of bits in the counter, that NFU does not.
Aging. There are six pages (0-5). Counter values are shown for 5 clock ticks. The R (reference) value for each time interval, per page, is show at the top.
Working Set Theory
When a process is loaded into memory, its first memory reference incurs a page fault. Its next memory reference, if not to the same page, also incurs a page fault, and so on, until the process has the pages it currently needs in memory. This strategy, where the OS will only load a page when it's referenced, is called demand paging. The alternative, loading pages before they are explicitly referenced, is called pre-paging.
Working Set Page Replacement
The principle of locality of reference (both spatial and temporal) implies that a process will refer to a small subset of all of its pages for the majority of its memory references, especially over a limited time window. The set of pages that a process is currently using is called its working set. Working set theory was first defined by Peter Denning. Many paging systems try to keep track of each process' working set and make sure it is in memory before the process runs. This is called the working set model. A process' working set will naturally change over time.
Resident Set: the set of pages the process currently has loaded in memory.
Working Set: the set of pages used over the k most recent memory references. The function w(k,t) is the size of the working set at time t. Often, a time interval called τ is more practical to use instead of counting up k memory references. When used, τ is in units of virtual time (time actually on the CPU).
As the constant k is increased, a process' working set w(k,t) gets larger, too, but it cannot exceed the total number of memory pages referenced over the process' lifetime (it approaches this value asymptotically).
Ideally, the OS would try to keep the Resident Set about equal to the Working Set most of the time, if it's not able to keep all of a process' pages in
memory at once. If a memory page falls out of a process' Working Set, the page would be a candidate for replacement.
Working Set vs. Resident Set. This aspect of Working Set Theory suggests a strategy for changing the number of page frames allocated to a given process at runtime. Specifically, if the size of the Working Set increases, the system allocates more frames for that process, while if the size of the Working Set decreases, the system take frames away from that process. Note that this requires the system to maintain a global view since now frames are moved between processes at runtime, and raises other questions. For example, where do you get a free frame to give to a process if you need to increase its frame allocation?
Degree of Multiprogramming. Working Set Theory also explains thrashing, as described above -- having "too many faults" essentially means that one or more processes don't have their Working Sets in memory. Thus, if the sum of the sizes of all the Working Sets of processes in memory is greater than the number of page frames, the system would typically decrease the degree of multiprogramming by removing a process from memory and reallocating its frames to support the Working Sets of the processes still in memory.
Page Fault Frequency. Tracking the pages in the Working Set can be resource intensive. One alternative approximation uses the Page Fault Frequency (PFF) instead. The idea is that, if a process' PFF is too high, the OS should increase the size of the process' Resident Set. if a process' PFF is too low compared to others, the OS should decrease the Resident Set of the process.
If all of a process' pages are in memory all the time, it will never fault. Conversely, if only a few of its pages are in memory, the process will fault frequently. Somewhere in the middle lies the Working Set, representing a happy medium.
Working Set Algorithm
The idea is to find a page that's not in the Working Set and evict it. The Page Table contains the reference bit R, plus an entry for the approximate last time the page was referenced (time of last use, or TOLU). Empty PTEs are ignored.
The hardware sets (to 1) the R bit for each page, on memory accesses.
A periodic hardware clock triggers an interrupt, which clears (sets to 0) all the reference (R) bits. Hence, if a page has R=1 at the time of an interrupt then the page has been referenced since the last clock interrupt. The Tanenbaum text refers to the interval between timer interrupts as one 'tick' of the clock.
On every page fault, the Page Table is scanned for a page to evict:
As each entry is processed, the R bit is examined. If a page has R=1, the current virtual time is written to the page's TOLU field. The page is clearly in the working set, and will not be evicted. (note: it is not necessary for hardware to update TOLU on every memory access. It is sufficent that the hardware update the R bit on memory accesses. The algorithm will update each candidate page's TOLU as needed, whenever it scans the page table searching for an eviction candidate.)
If R=0, the page's age is computed. The page's age is the current virtual time minus the page's TOLU. If the page's age exceeds τ (the time delta that defines the Working Set), the page is no longer in the working set and is replaced. If the page's age is less than τ, it's spared from replacement but its age is noted.
If the entire page table is scanned and all pages are in the working set, the algorithm looks at all pages with R=0 but an age less than τ, and the page with the greatest age is evicted (this page was in the working set, but was the oldest page in it).
The limitation of the basic Working Set algorithm is that the entire page table has to be scanned at each page fault, until a suitable candidate is located. An improved algorithm, WSClock, is similar to the Clock algorithm, but also includes Working Set information.
WSClock Algorithm. WSClock works well in practice, and is widely adopted. It works as follows:
The clock is composed of a circular list of page frames, just as in the Clock algorithm.
As mentioned above, the Working Set is defined as the set of pages referenced within the last τ units of virtual time (time actually on the CPU).
As pages are referenced and loaded into frames, their R and M bits are updated, as well as the time of last use (TOLU) value. A page's "age" is the process' current virtual time minus the page's TOLU.
At each page fault, the frame pointed to by the clock pointer is examined first:
If R=1, the page has been referenced during the current tick. It is not replaced, but R is set to 0, the page's TOLU is updated, and the pointer advances.
If R=0, M=0, and the page's age > τ, it's not in the working set and the page can be claimed directly.
If R=0 and M=1, the page is dirty and a write to disk is scheduled, but the pointer advances, and TOLU is updated.
If the pointer completes a full clock cycle without finding a page to replace, there are two cases:
At least one write has been scheduled during the cycle. The pointer continues to scan until it finds a clean page (which it will, when one of the scheduled writes completes).
No writes have been scheduled during the cycle. All pages are in the working set. The pointer takes any clean page (M=0) and claims it. The pointer could have noted the last clean page it observed during its sweep, to make this faster. If there are no clean pages at all, the current page is written back to disk and replaced.
Pages may be chosen for replacement on a global or local basis.
Page replacement algorithms help the OS to choose which memory page must be evicted.
The OS may manage the degree of multiprogramming when memory pressure exists.
Working Set Theory outlines a strategy for managing virtual memory based on page usage information.
According to the documentation here and here, Linux uses both a variant of LRU and a variant of the Clock algorithm for page replacement, with some Working Set concepts mixed in.
This summary of Page Replacement algorithms is quoted from the Tanenbaum text:
The Optimal algorithm evicts a page that will be referenced furthest in the future. Unfortunately, there is no way to determine which page this is, so in practice the algorithm cannot be used. It is useful as a benchmark, however.
The NRU algorithm divides pages into four classes depending on their R and M bits. A random page from the lowest-numbered class is chose for replacement. This algorithm is easy to implement, but crude. Better algorithms exist.
FIFO keeps track of the order in which pages are loaded into memory. Removal of the oldest page is trivial, but the page may still be in use, so FIFO is a bad choice.
Second chance is a modification of FIFO that checks if a page is in use before removing it. If it is, the page is spared. This modification greatly improves the performance.
Clock is simply a different implementation of Second Chance. It has the same performance, but takes a little less time ot execute the algorithm.
LRU is an excellent algorithm, but it cannot be implemented without special hardware. NFU is a crude attempt to approximate LRU, but it is not very good. Aging is a much better approximation to LRU and can be implemented efficiently. It is a good choice.
The Working Set algorithm gives reasonable performance, but is somewhat expensive to implement.
WSClock is a variant that gives good performance and is efficient to implement.
Overall, the two best algorithms are Aging and WSCLock. They both give good performance and can be implemented efficiently, so they are the most important in practice.