man -s4 proc
for more information. And, of course, the ps command gives
some info as well. Think of the kernel as a business and the
processes as customers. The process table is like the records
business keep on clients.
open
, which requests that a connection to a file
be made;
close
, which requests that a connection to a file
be closed;
read
, which requests that some bytes be read from
a file via a specified connection; and
write
, which requests that some bytes be written to
a file via a specified connection.
The C language (and pretty much any other language is similar) has operating system independent functions for I/O (mostly in stdio.h), like fprintf, fscanf, etc. Their implementations (which we never see) make system calls to get the work done and, of coures, those implementations will be different in Unix vs. Windows vs. PalmOS, etc., because the system calls provided by each OS are different. Soon we'll learn Unix system calls that allow us to make direct requests to the kernel to perform filesystem operations.
Unix's model of of files is simple: files are just sequences of bytes. User programs may impose structure or meaning on those bytes ... Unix doesn't care.
A user process and the kernel must agree on names for open
file connections. Analogous to the difference between a
program and a process, there is a difference between a file
and an open "connection" to a file: for example we may have
two open connections to the same file, but be in different
positions in the file with respect to our next read. So the
file name itself isn't really appropriate for communicating
which connection you want to read the next byte from. So
the OS and the user process refer to each open connection
by a number (type int) called a file descriptor. Standard
input, output and error default to file descriptors 1, 2 and
3 respectively.
You can actually fetch the file descriptor associated with
each open C FILE* with the
int fileno(FILE* fp);
function.
Remember this ↓
file descriptor 0 is a processes' stdin
file descriptor 1 is a processes' stdout
file descriptor 2 is a processes' stderr
The process table entry (aka process control block) contains a table, the file descriptor table that gives the mapping between the descriptor the process uses to refer to a file connection and the data structure inside the kernel that represents the actual file connection.
Several system open-file table entries may actually refer to the same vnode table entry. That vnode table entry cannot be removed from the vnode table until all of those referencing system open-file table entries have been removed.
A file may be referenced by several entries in the
filesystem (this comes from "hard links", which you can
create with the ln
utility) and the file cannot
be removed from the filesystem until all of those
references to the file have been removed.
Sensing a pattern? System open-file table entries, vnode table entries, and filesystem entries (inodes actually) each contain a counter (called a reference count) that tracks the number of references to that object. Each time one of the referencing objects goes away or changes to refer to something else, that counter gets dcecremented. When it hits zero, the object itself can be deleted. Reference counting is an important idea and is found in lots of places in CS.
open
to read
from foo.txt
, and the C standard library call
fopen
to write to bar.txt
.
Note that the diagram shows another process. What can you say
about how the other process was probably created from the
shell?
Remember that I/O buffering is a C standard I/O library issue. The following two programs are instructive. It's worth compiling them and running them and trying to figure out why they do what they do (Note that they require the file
alphabet.txt
):
testCIO.c
,
testKernelIO.c
.