A computer's non-persistent memory is often called RAM (random access memory). It's primary distinctions from secondary storage are:
It is faster to access
It loses its value when power is removed (it is non-persistent or volatile)
It is more expensive, on a per-byte basis
The operating system must manage RAM, cooperatively with the CPU and the chipset. The unit within the OS that does this is the Memory Manager. Note that sometimes a memory manager is called an MMU (memory management unit), which can be confusing because the same terminology is used for the portion of the processor that supports memory operations. We will try to keep the distinction clear by only using the abbreviation MMU when we refer to hardware, and use the term Memory Manager when referring to a component of the OS.
Recall that the two primary roles of an operating system are (1) to provide useful abstractions and (2) to implement those abstractions efficiently on the hardware. For memory, an address space is the abstraction. It represents the set of addresses that a process can access, and is used to separate conceptually the memory as seen by a process from physical memory. The issue then becomes how to implement address spaces, and how to map them into physical memory. To do this raises many questions:
What region(s) of physical memory does a Memory Manager allocate to a process?
Does the region need to be one big contiguous region or not?
Once it's allocated, can the location change?
Does the address space all need to be in memory at once?
Does ANY of the address space have to be in memory at a given time (e.g., when a process is blocked?)
How do you confine a process to accesses only within its own region?
How are logical addresses translated into physical addresses?
These are all questions we'll address.
Modern operating systems support the ability to have more than one process running at a time (multiprogramming), meaning that the addresses spaces of multiple processes (or parts of them) will be in memory at the same time. This means that at the very least, the following two problems need to be solved:
Relocation: the ability to load a process at any physical memory address
Protection: the ability to restrict a process' access to its own memory area
Fragmentation and Compaction
The above illustrates a general problem called fragmentation, which is related to the existence of unused areas of memory.
Here, we'll talk about in the context of memory systems, but it can occur in any storage system. It comes in two varieties:
Internal Fragmentation: unused memory inside a unit of allocation.
External Fragmentation: unused memory outside a unit of allocation.
In the previous example, external fragmentation occurs. This can be reduced through occasional memory compaction. Compaction involves moving all the allocated areas of memory into one contiguous region, eliminating areas of external fragmentation. An OS will try to avoid or minimize the need for compaction, because of its very high CPU and energy cost.
One variation that can reduce the frequency of compaction is to allocate each process a little more memory than it actually needs. Within the allocated region, the process can gradually grow to fill its memory area, as in the following figure. Note that this strategy also introduces internal fragmentation.
(a) Allocating space for a growing data segment. (b) Allocating space for a growing stack and a growing data segment. (image: Tanenbaum)
Review of Process Address Space Organization
The Executable Stack
A program's local variables will generally be allocated on the stack if they are simply declared. When the program is compiled, the compiler adds instructions that create local storage space on a function's stack frame. When the function is called, the space is allocated within the stack frame. When the function completes and returns, the space used by that function is de-allocated right off the top of the stack. This makes sense because the function's local variables go out of scope when the function returns. This type of allocation is convenient for two reasons:
It happens automatically, and need not be managed explicitly by the programmer
It is fast and efficient
There are two exceptions: global variables and static variables. Since global and static variables are always in scope, they must be allocated on the heap, along with areas explicitly allocated using calls like malloc.
The heap is a region of memory or address space that is assigned to a process. The program may allocate and deallocate memory on the heap at any time, but must do so explicitly. Memory that is allocated but not explicitly deallocated may cause leaks, which are unusable regions of memory that can accumulate. Some operating systems or runtime environments use a garbage collector, which finds and reclaims unused memory regions, to mitigate this problem.
Various operating systems will have their own API calls for memory allocation and deallocation. In the C runtime, we use the library calls like new, calloc(), malloc() and free(). If you'd like to see your Linux system kill a process that has allocated too much memory, open up System Monitor and try the following simple demo:
while (1) c = (char *)malloc(1024 * 1024);
Segmentation: Variable Sized Address Spaces
The idea of splitting up a process' address space into multple pieces that can be separately allocated and managed is called segmentation. While the pieces are still variable sized, they're smaller than a process' entire address space and hence, easier to place and manage. This is also one way of addressing the problem that, in real programs, there are often multiple allocated memory regions that change in size over time.
Multiple allocated regions of memory may need to vary in size over time. (image: Tanenbaum)
Each segment has a base and a limit that define its size. Because varying sizes are possible, internal fragmentation can be minimized.
Segments can grow or shrink independently. (image: Tanenbaum)
Base and Limit Registers
Two of the questions from above were: (1) how are addresses translated from logical addresses to physical addresses, and (2) how do we prevent an executing process from accessing memory outside of its alloted region? A simple way to do both in segmented memory address spaces is to employ base and limit registers.
The base register holds the lowest memory address that can be used the by the process and the limit register holds the highest offset address (relative to the base) that can be used by the process. This is equal to the "segment length" - 1, and sometimes "limit" and "length" are used interchangeably when talking about segments.
Whenever the CPU needs to access memory, the hardware automatically adds the base value to the address generated by the program. The hardware also verifies that the address does not lie outside the limit value, in which case a fault is issued.
The base and limit registers are often implemented such that only the OS can modify them.
A key limitation is the additional time required to add the base address value, something that has to be done on every memory reference.
Base and limit registers can be used to give each process a separate address space. (image: Tanenbaum)
Segments are visible to the programmer (or at least the compiler). Executable code that references segmented memory must include the identifier of the segment being referenced, plus the offset. Each segment can also have its own protection values. For example, a data segment would be read/write, but a code segment would be read-only. There may be a kernel segment and a user segment.
Allocation of variable-sized regions leads to external fragmentation. Eventually, if there are enough small, unused external fragments, and there is a need for more memory allocation, compaction is required. Compaction is the process of copying entire blocks of allocated memory into more tighly spaced contiguous blocks.
External fragmentation develops over time, then is corrected through memory compaction by the OS. (image: Tanenbaum)
Descriptor Tables and Address Calculation
A generalization of the base register+limit approach is to have a table called a segment table, sometimes called a descriptor table because each entry is a descriptor. The table is essentially implmented in hardware and maintained by the OS. They can exist both globally and for individual processes. The Intel architecture uses the term Global Descriptor Table for the OS-wide segments, and the term Local Descriptor Tables for segments particular to a process.
The following figure illustrates how a physical address is calculated from a logical address in a system that uses segmentation; note that this
is done in hardware:
The offset bits are carried forward when converting a logical address to a physical address using segmented memory. The base address of the referenced segment needs to be pulled from the segment table. Then, the base + offset equals the desired memory address. Unique to segmentation, the hardware and OS must also check that the offset bits do not exceed the segment's upper bound or limit, which would cause a segfault.
Virtual Memory: Fixed Size Memory Address Spaces
Having a fixed sized unit of memory allocation simplifies many issues related to memory management. This is the key difference between a system that uses segmentation and a system that uses virtual memory. In the latter, fixed-sized units called pages are loaded into fixed-sized regions of memory of the same size called page frames. Pages are usually not that large; 4K is a commonly used size. By requiring all pages to be this size, and allowing non-contiguous use of the pages, all external fragmentation is eliminated.
A second key issue is that often in real computing systems, one process, or many processes together, need to use more memory than is physically available. Virtual memory also solves this problem. The basic idea behind virtual memory is this:
Each program has its own address space, which is divided into pages
The pages are mapped onto physical memory, but not all of a program's pages have to be in physical memory simultaneously for the program to run.
A page may be mapped to any physical frame (or no frame at all). In particular, there are no requirements for pages to be located
consecutively amongst the page frames (in physical memory).
When the program references a part of its address space that is not currently in physical memory, the OS is alerted to go retrieve the missing page and load it from disk back to memory. This is called a page fault.
Virtual memory thus requires the "translation" of virtual addresses to physical addresses whenever a memory reference is made:
A virtual memory address consists of the virtual page number, concatenated with a "page offset". The virtual address is what is seen and used by a program.
A physical memory address consists of the frame number, concatenated with a "page offset". The physical address is what is actually used by memory to access the data.
The OS must maintain a page table that maps virtual pages to the physical frame where they currently reside. This page table is used to translate from a virtual address to a physical address, whenever a memory refernce is made.
Virtual Address Translation
In a virtual memory system, the page table maps a virtual page number to a physical frame number. To calculate a physical address, the virtual page number must be used to check in the page table to get the right frame number where that virtual page currently resides. Once the frame number is obtained, the address of the start of the frame is combined (conatenated) with the offset bits to obtain the right physical address.
Translation of a virtual address to a physical address, using a Page Table. Note the offset bits do not change; they are carried forward.
This mapping between pages and frames occurs in the Page Table. For each virtual page in the Page Table, it indicates what physical frame that page currently resides in, or indicates that the page is not currently loaded into a physical memory frame. A page's presence or absence in physical memory is tracked using a present/absent bit. If this bit is 0, the page is not loaded in a frame. If the bit is 1, the page is currently loaded into the indicated frame number.
An example page table. The page size is 4K. The number in the page table entry indicates the memory frame where the page is loaded, or 'X' if the page is not currently in physical memory. The page table will change over time, as virtual memory pages are loaded into frames or swapped out to secondary storage (disk). Note the virtual memory size is 64K, while physical memory is only 32K. (image: Tanenbaum)
If a memory reference occurs on a page that's not currently in a main memory frame (e.g., the present/absent bit is 0), a page fault occurs. This signals the OS that the calling process will not be able to continue until the needed page is swapped in from disk.
Virtual addresses are mapped to physical addresses as follows:
The virtual memory address bits are divided into a page number and an offset. The offset indicates how far from the beginning of the page the memory reference occurs.
The offset bits are copied directly from the virtual (or logical) address (the input) to the physical address (the output).
The bits of the page number are used as an input to the Page Table, locating a Page Table Entry (PTE).
At the PTE, if the present/absent bit is 0, a page fault is issued and the OS must go fetch the needed page from disk.
If the present/absent bit is 1, the frame number is used to form the left side of the constructed physical memory address (the offset bits form the right side).
The process is illustrated in the following figure:
Translation of a virtual address (at bottom) to a physical address (at top), using a Page Table. (image: Tanenbaum)
A Page Table Entry (PTE) usually contains other important information, as well:
A caching bit may indicate whether caching is enabled on that page. Turning off caching is useful in systems with memory-mapped I/O devices (causing the CPU to get new updates directly from an input device).
A modified bit (sometimes called a 'dirty' bit) indicates whether the page has been written to.
A referenced bit is set whenever a page has been referenced (read from or written to).
One or more protection bits indicate the read/write/execute permissions associated with the page.
The following relationships exist between the virtual address, the physical address, the page size, and the page table:
In a traditional page table, the number of PTEs required is equal to the number of virtual memory pages.
The size of a page is always equal to the size of a frame.
The number of pages equals the size of virtual memory divided by the page size. For example, if virtual memory is 240 bytes and a page is 210 bytes, then there must be 240 / 210 = 230 pages.
If a page/frame is referenced using n offset bits, its size is 2n bytes (assuming memory is byte addressable).
The size of physical memory equals the size of a frame times the number of frames. For example, if a frame is 4KB (or 212 bytes) and there are 220 frames, then physical memory is 212 * 220 = 232 bytes, or 4GB.
The number of bits used for each part of an address is important. Here's more on that:
The total number of virtual pages is equal to 2(# of bits in the virtual page number) bytes. For instance, in the "Translation of a virtual address" figure shown above, there are 4 bits in the virtual page number (the part used to index into the page table), so there are 24=16 virtual pages.
The total number of physical frames is equal to 2(# of bits in the physical page number) bytes. In that same example above, there are 3 bits in the physical frame number (the main data shown in each page table entry), so there are 23=8 physical frames.
The size of a page (and thus the size of a page frame) is equal to 2(# of bits in the page offset). In that same example above, there are 12 bits in the page offset, so the page size is 212=4096 bytes.
These calculations can also go in reverse. For instance, if you knew that there were 128 virtual pages, you can compute that there must be 7 bits in the virtual page number, since 27=128.
Recall that Relocation is the idea that a program's allocated memory may be in a different physical location each time the program is loaded, or each time a particular page is swapped in from disk. Segmentation allowed for this by varying the "base" register; virtual memory naturally provides this through the page table, which allows page to go anywhere in memory.
Virtual memory also presents multiple advantages that are not true with segmentation:
Illusion of a large physical memory. With virtual memory, the virtual memory space of any one process can potentially be much larger than the actual physical memory, because not all of the pages must be loaded into memory at once. A program need not know (or necessarily care) how much actual memory is available, and the OS can easily take advantage (after a reboot) if more memory is added.
Flexible protection mechanisms. We'll see more about this later, but the page table provides a natural mechanism for specifying per-process read/write/execute permissions on individual pages, and can also be used to share individual pages amongst processes.
Non-contiguous allocation. With segmentation, there are not many segments to allocate, and all of the memory area in a particular segment must be physically contiguous. If a segment needs to be enlarged and there is no available memory directly following it, we may be out of luck. With virtual memory, on the other hand, each process may be allocated many pages, and the pages need not be contiguous in physical memory, which often allows more of physical memory to be allocated.
No external fragmentation. With segmentation, external fragmentation can lead to inefficiency and compaction, which hurts performance. With paging, there is no external fragmentation. Although there may be some internal fragmentation, inside the final page of a process' multi-page allocation, keeping the page size relatively small minimizes this.
For these reasons, virtual memory has become more dominant over time.
Challenges with Virtual Memory
Virtual Memory, as described so far, has two primary limitations:
The mapping from virtual address to physical address is done very often, so it must be very fast. Having a Page Table in memory requires that
every memory reference generates at least one additional memory reference. The first memory lookup is to the Page Table, the second memory lookup is
to the actual requested address. In essence, every lookup requires two (or more) lookups instead of one.
If the virtual address space is large (as it usually is), the Page Table will be very large. When this happens, the Page Table itself may consume a significant chunk (or even all!) of physical memory. Some method of reducing the Page Table size is very desirable.
In this section, we discuss several optimizations that address these two challenges.
TLB: a Cache for Page Table Lookups
The solution to the two-memory-lookups problem lies in hardware. The construct is called a Translation Lookaside Buffer (TLB). A TLB is basically just a cache for page table lookups. The type of cache used has fully associative memory. Recall from computer architecture that this allows the MMU, in essence, to examine all rows of the cache (TLB) in parallel. Entries in the TLB look like PTEs, only there are fewer of them. Thanks to spatial and temporal locality, though, it's possible to have a high "TLB hit rate." If the PTE for the referenced page is in the TLB, it's a TLB hit. Otherwise, it's a TLB miss, and the MMU goes to the full Page Table, in RAM, as previously described, to find the right frame. Although the TLB is a hardware construct, management of the TLB is often done cooperatively between the OS and the CPU. Exact details are specific to the architecture and the OS.
When we include the TLB, the overall process of translating a virtual address to a physical one now looks like this:
Address calculation, with the TLB included.
Multi-Level Page Tables
One solution of the "huge page table" problem is to split it up into multiple levels, like a tree. In this scheme, the first few bits are an index into the top level page table, the second few bits are an index into the second-level page table, etc. The concept can be extended to any number of levels, but typically two to four will suffice. The actual Page Table Entries (PTEs) are found at the last level (the leaves of the tree). A Multi-level Page Table is illustrated in the following figure:
The drawback of Multi-level Page Tables is that the more layers they have, the more memory lookups are required for a single memory reference. The silver lining is that a TLB hit alleviates the need for all the lookups, so a respectably sized TLB reduces this penalty.
Inverted Page Tables
The other common solution to the "huge page table" problem is the Inverted Page Table. Because virtual memory is usually much larger than physical memory, most of a traditional Page Table's entries are empty (not "present"). Instead of holding a row for each virtual memory page, the Inverted Page Table has a row for each physical memory frame.
The drawback to the Inverted Page Table is that looking up a frame by its page number is no longer a direct lookup. Instead of looking directly into entry 92 of the Page Table to find what frame holds page 92, we need to search the Inverted Page Table, in effect asking, "does any frame hold page 92?". This limitation is mitigated in two ways. The first is the TLB, of course. The second method, should a memory lookup be needed, is a mapping function, such as a hash function. If there is one entry in the Inverted Page Table for each real memory frame, there will be about one entry per row (assuming few hash collisions). The only wrinkle comes when you do have hash collisions, which are handled, as usual, via some chaining function.
The Inverted Page Table is illustrated in the following diagram:
A traditional Page Table (left) requires 252 entries, in this example. The equivalent Inverted Page Table (right) requires only 218 entries. The relative size ratio is 234, or about 17 billion times smaller. Yet, a lookup still takes O(1) time. (image: Tanenbaum)
The Buddy System: Powers of Two
Besides segmentation and virtual memory, there is another memory address scheme that is sometimes used, called the Buddy System, or Buddy Memory Allocation. Memory can be allocated in different size units according to powers of two, which helps reduce inefficiency by allocating memory regions whose size more closely matches the requirement. In a Buddy System implementation:
Memory blocks are of size 2k, where L ≤ k ≤ U
The smallest block is of size 2L
The largest block is of size 2U
The entire space starts as one block of size 2U
Memory allocation forms a binary tree
The buddy system has been used in kernel memory allocation in both Unix and Linux. Note, however, that it doesn't translate well to virtual memory, so its use is primarily relegated to physical RAM.
Memory regions are structed as a binary tree. In this example, L=16 and U=20. (Image: Stallings)
Smaller regions may be coalesced into larger ones as adjacent ones are freed. (Image: Stallings)
Multiprogramming is the ability to have more than one process in memory at a time. It can be done without a memory abstraction, like paging or virtual memory, but doing so is challenging due to the need for protection (protecting processes from each other, and protecting the OS from user processes) and relocation (the ability to load any process at any memory location).
Basic address spaces can be implemented using a base and limit register.
Internal fragmentation is unused memory inside a unit of allocation.
External fragmentation is unused memory outside a unit of allocation.
Reducing external fragmentation may be done using compaction, in which allocated memory regions are relocated to a contiguous area at the top or bottom of memory. Compaction is a costly operation, and should be avoided.
Segmentation employs variable sized regions of memory allocation. It can provide relocation and some kinds of protection.
Virtual Memory allows a system to run a process or collection of processes whose image size is larger than all of physical memory, by keeping some memory pages swapped out to secondary storage (disk).
Virtual memory paging requires a Page Table. Each page table entry (PTE) maps a virtual page to a physical memory frame.
The TLB is a hardware entity -- an associative cache for PTEs. It greatly reduces the average time to look up a PTE.
Two strategies for reducing the size of the Page Table are Multi-Level Page Tables and Inverted Page Tables.
The Buddy System allocates memory in units that are powers of two and represents them as a binary tree. It's not a general
replacement for virtual memory, but can be useful in special situations, such as inside the kernel.