Menu

Class 8: Programs, Processes and Memory


Reading
Chapter 7 of APUE, especially 7.6

Homework
To Appear.


Memory layout of a C program
Recall that a program is just a file --- a file that has executable permissions and that the kernel recognizes as having the proper binary format for that particular system. For Unix, properply formatted programs start with the magic bytes 127 'E' 'L' 'F' which, in hex, is 7f 45 4c 46. Following those four bytes in an "ELF formatted file" are more bytes describing the platform that the file is supposed to run on. (There are utilities --- on Solaris, elfdump, on Linux readelf --- that give you information about ELF formatted files. Also, the file utility gives some info.) When a program is launched, the kernel takes that executable file and creates a process from it. Understanding how C programs are laid out helps in understanding how the kernel takes a program (which is a file) and creates that process.

Section 7.6 of APUE walks you through the memory layout of a C program and provides a nice diagram. What's really important is what the various pieces are, not where they appear in this diagram, because they may be moved around on your system. The pieces are: the text segment, which is the actual code to get executed, initialized global data, uninitialized global data, the heap, the stack, and space for the command-line arguments and environment variables.

We can actually see the memory layout in action by printing out the addresses of various things in our program. The following program prints out a lot of stuff that we can use to deduce the layout of our program.
#include <stdio.h>

extern char** environ;

// "fun" is actually a pointer to the function's compiled code
int fun(int x);

// Global data: both initialized and uninitialized
char *p, *q, *s;
char a='Q';
char b='R';
char c, d;

int main(int argc, char **argv)
{
  /************************************************
   * Print addresses of different kinds of objects
   ************************************************/
  // Text: fun & main are pointers to instructions in text segment
  printf("main  : %p\n",main);
  printf("fun   : %p\n",fun);

  // Initialized and uninitialized global data
  printf("a     : %p\n",&a);
  printf("b     : %p\n",&b);
  printf("c     : %p\n",&c);
  printf("d     : %p\n",&d);

  // The stack: argc is on the stack, also fun's local variable t
  printf("&argc : %p\n",&argc);
  fun(3);

  // Command-line arguments and environment varibles
  printf("argv  : %p\n",argv);
  printf("environ %p\n",environ);

  // Heap objects
  q = malloc(8);
  printf("malloc: %p\n",q);
  //  free(q); // UNCOMMENT TO SEE INTERESTING STUFF
  s = calloc(8,1);
  printf("calloc: %p\n",s);
  //  free(s); // UNCOMMENT TO SEE INTERESTING STUFF

  /************************************************
   * Read and write from/to various locations
   ************************************************/
  // Read/write 'q' to initialized data segment
  printf("First byte of global variable a %i\n",*(&a));
  *(&a) = 'q';
  
  // Read/write to a stack location
  printf("First byte of stack location where t used to be %i\n",
	 *(unsigned char*)p);
  *p = 'q';

  // Write to a heap location
  printf("First byte of memory returned by malloc %i\n",*(unsigned char*)q);
  *q = 'q';

  // Write to a heap location
  printf("First byte of memory returned by calloc %i\n",*(unsigned char*)s);
  *s = 'q';

  // Write 'q' to text segment (seg fault!)
  printf("First byte of compiled code for fun %i\n",*(unsigned char*)fun);
  *(char*)(fun) = 'q';

  return 0;
}


int fun(int x) 
{ 
  int t=x*x;

  // Set p to point to a stack location
  p = (char*)&t;
  printf("t     : %p\n",p); 

  if (x == 1)
    return t;
  else
    return fun(x-1); 
}
bash$ gcc layout.c
bash$ ./a.out
main  : 8050df8
fun   : 8050fd6
a     : 80613f8
b     : 80613f9
c     : 8061434
d     : 8061435
&argc : 80470a4
t     : 8047064
t     : 8047044
t     : 8047024
argv  : 80470bc
environ 80470c4
malloc: 8061458
calloc: 8061468
First byte of global variable a 81
First byte of stack location where t used to be 0
First byte of memory returned by malloc 96
First byte of memory returned by calloc 0
First byte of compiled code for fun 85
Segmentation fault
The first thing to notice here is what happens at the end ... *(char*)(fun) = 'q'; generates a segfault. What does it do? Well, "fun" is a pointer to the start of a chunk of compiled code in the text segment. We try to put a 'q' in the byte "fun" points to, and we get a segfault because that memory is read-only.

You should be able to deduce the basic layout of this program based on these pointer values ... remember that they're in hex though.

From program to process
The file that is your program includes the bytes that end up in the text segment, and it includes the bytes that end up as initilized data. It does not include the unitialized data (bss), rather a sufficiently large chunk of memory is set aside for the bss when the process is created, and all the bytes in it are set to zero. The stack and the heap also have no representation in the program itself. That space is managed during the execution of the program. Finally, the space for command-line arguments and environment variables is not represented in the program (the compiler has no way of knowing what these values will be when the program is executed, or of even knowing how many there will be. When the kernel creates the process, it populates this area with the correct values.

Kernel and processes
The most fundamental resource controlled by the kernel are processes. Only the kernel can create a process or terminate a process. In fact, the kernel exerts a lot more control than just bith/death of processes. In most computers, the number of processor cores is much smaller than the number of processes alive at any given moment on the machine. On a k-core machine, at most k processes can be running simltanesouly. When there are greater than k processes, the kernel is the one that decides which k of the processes get to execute at any given point in time. Thus processes may be running, or they may be ready to run (though not currently running). A process may be blocked, waiting for some I/O operation to terminate before it can once again be considered as ready . The book, page ?, has a nice description of this using a finite state machine?".