Processes

Processes

Reading

Objectives

The Process Model

The Tanenbaum text says, "The most central concept in any OS is the process: an abstraction of a running program". Processes support multiprogramming and concurrency. We can also refer to a process as a unit of resource allocation and protection.

In normal execution, the CPU switches from one process to another quickly, running each process for a few milliseconds, typically, then executing a context switch so the next process can run.

The notion of processes and multiprogramming works just fine with a single CPU. The text assumes just one CPU for this discussion of processes. Of course, the concept of processes is valid in a multiprocessing system, too.

Processes may belong to the kernel, or to users. The same structure is used for both kinds of processes, because they all need to run on the same processor. The main difference is that processes belonging to the OS kernel will execute in a privileged mode (e.g., ring 0) on the CPU, whereas user processes will execute in an unprivileged mode (e.g., ring 3).
A conceptual view of multiprogramming with four processes. Each process has its own program counter.
All four processes are loaded in memory, but only one runs at a time on the processor. (Tanenbaum)

Process Creation

There are four principal events that cause a process, or processes, to be created: In Unix/Linux, there are usually two steps to new process creation: The example below is from the Tanenbaum text. When a shell command is given in Linux, the shell process first creates a new copy of itself. The child process then actually runs the shell command executable.
A conceptual view of the command ls being executed from the shell. (Tanenbaum)


Fork and Exec. Although there are usually OS API calls that provide an additional layer of abstraction, the fundamental idea of a fork() system call underlies process creation in all modern operating systems, including Unix/Linux, Windows, MacOS, iOS, and Android.

Here is a simple fork() example for Linux: code
/* fork.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main(int argc, char **argv)
{
    printf("PID: %i I am the parent process and am initializing\n", getpid());
    int counter = 0;
    pid_t childPID = fork();

    if (childPID == 0) {
        // child process
        pid_t myPID = getpid();
        printf("\tPID: %i I am the Child Process. childPID=%i\n", myPID, childPID);
        printf("\tPID: %i Press Enter to Terminate\n", myPID);
        getchar();
    }
    else if (childPID > 0) {
        // parent process
        pid_t myPID = getpid();
        printf("PID: %i Parent Process runs after the fork(). childPID=%i\n", myPID, childPID);
        printf("PID: %i Enter to Terminate\n", myPID);  
        getchar();  
    }

    printf("PID: %i \tThis process is terminating. Bye!\n", getpid());
    return 0;
}
And another example, in which the parent sends a termination signal to the child: code
/* fork-signal.c */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>

static int terminateFlag = 0;

void sig_handler(int signalNumber) {
    terminateFlag = 1;
    printf("\tPID: %i Caught signal %d. Sig handler done...\n", getpid(), signalNumber);
    return;
}

int main(int argc, char **argv) {
    printf("PID: %i I am the parent process and am initializing\n", getpid());
    int counter = 0;
    pid_t childPID = fork();
 
    if (childPID == 0) {
        // child process
        pid_t myPID = getpid();
        printf("\tPID: %i I am the Child Process. childPID=%i\n", myPID, childPID);
        printf("\tPID: %i Press Enter to Terminate\n", myPID);

        // Install the signal handler
        signal(SIGINT, sig_handler);
        signal(SIGTERM, sig_handler);

        int i;
        for (i=0; i<1000; i++) {
            sleep(2);
            printf("\tPID: %i I am the Child Process. I'm awake and going back to sleep now.\n", myPID);
            if (terminateFlag) {
                printf("\tPID: %i The Child Process terminates..\n", myPID);
            return 0;
            }
        }
    }

    else if (childPID > 0) {
        // parent process
        pid_t myPID = getpid();
        printf("PID: %i Parent Process runs after the fork(). childPID=%i\n", myPID, childPID);
        printf("PID: %i Enter to Terminate\n", myPID);  
        getchar();
        kill(childPID, SIGINT);
    }
  
    printf("PID: %i \tThe parent process is terminating... Bye!\n", getpid());
    return 0;
}

Process Termination

A process will normally terminate due to one of the following onditions: By default, the compiler will normally attach an exit() syscall at the end of an executable. In addition, most operating systems allow the application programmer to add exit handlers that perform cleanup actions before a process is destroyed.

Process Hierarchies

In Unix/Linux, by default, a process and all its descendant processes form a process group. When a user sends a signal to the keyboard, the signal is delivered to all members of the process group associated with the keyboard. Beginning with the first initial process then, the process hierarchy naturally forms a tree. This initial process was historically init, but now more often it's systemd (and there are others.) The process tree can be displaying using the pstree command:

$ pstree
systemd─┬─BESClient───6*[{BESClient}]
        ├─ModemManager───2*[{ModemManager}]
        ├─NetworkManager─┬─dhclient
        │                └─2*[{NetworkManager}]
        ├─accounts-daemon───2*[{accounts-daemon}]
        ├─acpid
        ├─agetty
        ├─apache2───5*[apache2]
        ├─auditd───{auditd}
        ├─avahi-daemon───avahi-daemon
        ├─boltd───2*[{boltd}]
        ├─colord───2*[{colord}]
        ├─cron
        ├─cups-browsed───2*[{cups-browsed}]
        ├─cupsd───dbus
        ├─dbus-daemon        
        ...

Windows does not have the same concept of a process hierarchy. A parent process gets a handle to the child process, but otherwise all processes are basically equal. If a Windows process is killed, its child processes live on normally.

When a Linux parent process is killed, its child processes are reassigned to the systemd process, as orphans. The systemd process will eventually supply the wait signal that will permit the child process to terminate normally. Until then, the child process can continue execution.

Process States

There are three basic states a process can be in: In the third case, the external event is often the completion of I/O or resolution of a page fault. However, it could also mean waiting for a synchronization primitive like a mutex or semaphore.
A process can be in the running, blocked, or ready state. (Tanenbaum)
This representation of process states generally applies to any modern operating system. However, each OS may use modified terminology or semantics in its own specific implementation.

Some other texts and documents also refer to a process state called "suspended", which means that one or more of the virtual memory pages needed by the process are not currently in RAM (i.e., they are swapped out to disk). Once the needed memory pages are swapped back in to RAM, a suspended process may go to the Ready or the Blocked state.

Process Implementation

One of the key role of an OS is to implement key abstractions such as processes. This is done as part of the OS kernel using a data structure called a process table that has one entry per process. Each process' entry is often called a process control block (PCB) The PCB contains all the important information needed by the OS to manage the process, such as:
Some of the fields of a typical PCB entry. (image: Tanenbaum)
Process Context. Some key components of the first column -- registers, program counter, program status word (PSW), stack pointer, etc. -- are together referred to as the process' context. The context of a program must be saved (from hardware to RAM) when the process cedes control of the CPU (either to an interrupt or to another process), and later restored (from RAM to hardware) when the process resumes.

Process Control Block (PCB). The PCB contains a process' context, plus any other information needed by the OS to manage the process. This includes the PID, currently assigned resources like files and memory, as well as scheduling and performance data. When a process is being replaced by another process and will not immediately resume, many of the fields in the PCB (e.g. CPU time used, process state, and data related to process scheduling) may need to be updated. We call this whole operation a context switch (or process switch), which is usually the result of a process blocking or using up its currently scheduled time quantum on the processor.

Process Image. Some books and operating systems refer to the collection of a process' PCB, executable program, stack, and data (heap) all together as the process image. The process image components are usually scattered in various locations in RAM, including both user space and kernel space, so the construct is somewhat conceptual (it's not just one monolithic thing inside the OS).
A process 'image'.
A process image may contain other information, but should be considered at a minimum to include the following:

Linux Process Notes

Process creation in Linux proceeds as follows:

System Calls for Processes

Linux processes are managed using system calls. The following table lists some of the more common ones:
Common Linux system calls for process management. (Tanenbaum)

Shell Commands for Processes

$ps –ef   # list all processes
$pstree   # show all processes, in tree format
$top      # dynamically updated list of most active processes

Process Info in /proc

There is a colloquial saying that "everything is a file in Linux." As one example, the /proc folder at the root of the Linux file system contains an abstract mapping of what is, in essence, the tree of current processes. When you view the contents of this 'directory,' you are really exploring the process tree. Open a terminal to examine some of the components (example output below trimmed for space).
$ cd /proc
proc$ ls -l
total 0
dr-xr-xr-x  9 root             root                           0 Jan 22 11:41 1
dr-xr-xr-x  9 root             root                           0 Jan 22 11:41 10
dr-xr-xr-x  9 systemd-timesync systemd-timesync               0 Jan 22 11:41 1043
dr-xr-xr-x  9 root             root                           0 Jan 22 11:41 11
dr-xr-xr-x  9 root             root                           0 Jan 22 11:41 110
dr-xr-xr-x  9 root             root                           0 Jan 22 11:41 111
...
-r--r--r--  1 root             root                           0 Jan 22 14:34 cmdline
-r--r--r--  1 root             root                           0 Jan 22 14:34 consoles
-r--r--r--  1 root             root                           0 Jan 22 14:34 cpuinfo
-r--r--r--  1 root             root                           0 Jan 22 14:34 crypto
-r--r--r--  1 root             root                           0 Jan 22 14:34 devices
-r--r--r--  1 root             root                           0 Jan 22 14:34 diskstats
-r--r--r--  1 root             root                           0 Jan 22 14:34 dma
dr-xr-xr-x  2 root             root                           0 Jan 22 14:34 driver
-r--r--r--  1 root             root                           0 Jan 22 14:34 filesystems
dr-xr-xr-x  9 root             root                           0 Jan 22 14:34 fs
-r--r--r--  1 root             root                           0 Jan 22 14:34 interrupts
-r--r--r--  1 root             root                           0 Jan 22 14:34 meminfo
-r--r--r--  1 root             root                           0 Jan 22 14:34 misc
-r--r--r--  1 root             root                           0 Jan 22 14:34 modules
lrwxrwxrwx  1 root             root                          11 Jan 22 14:34 mounts -> self/mounts
-r--r--r--  1 root             root                           0 Jan 22 14:34 partitions
...
Each process appears as a directory whose named is the PID. Try changing into one of these directories to see what info is available.

Process States in Linux

Not all terminology is standardized among operating systems. In the Linux source code and kernel vernacular, a 'task' corresponds to our notion of a process. From the Linux sched.h (some states omitted): The following diagram shows the relationship between the normal process states (running, ready, and blocked) and the terminology used in the Linux kernel code.
Linux 'task' states.

The Task Structure

The Linux "task structure" (task_struct) contains all the info for managing processes. It is maintained in memory, by the kernel, for each process. It contains the following types of information:

Linux Kernel Organization

As noted above, processes and other key OS functionality are implemented by the OS kernel. Tanenbaum text: "The Linux kernel sits directly on the hardware and enables interaction with I/O devices and the memory management unit and controls CPU access to them." The hierarchical organization is roughly as follows:

Structure of the Linux kernel. (Tanenbaum)
Each of the components above is implemented using one or more kernel processes, often running as daemons, or non-interactive, background processes.

Summary