Files

Files and File Systems

Reading

Files

The text identifies three requirements for long-term storage: A File is a logical unit of information created by a process. The information in a file must be persistent (not affected by process creation and termination). The File System is the part of the OS dealing with files, including how they are structured, named, accessed, used, protected, implemented, and managed.

File Naming

File names are familiar to programmers, but they are an important part of the design of a file system. Some file systems are case sensitive, like the UNIX family (UNIX, Linux, MacOS, Android, iOS, etc.), while other file systems are not case sensitive (original MS-DOS, FAT, NTFS, and ReFS of the Windows lineage).

Most file systems support two part filenames, with a base name and an extension, where the extension indicates something about what type of file it is. The UNIX-style operating systems perform file identification by structure, rather than by file extension. Although the Microsoft Windows operating system has traditionally used file extensions for launching an associated program, recent versions perform more structural identification, similar to UNIX.

Some common file extensions. (Tanenbaum)

File Structure

Historically, there have been three basic ways of structuring file systems.

File Types

There are four basic types of files described in the Tanenbaum text: All OS file systems have some notion of regular files and directories. The last two categories, character special files and block special files, are unique to operating systems in the Unix/Linux family, in which many things that are not exactly files are treated like files, to simplify some aspects of the implementation.

Identifying file types. In Unix/Linux systems, we can use the file command from the terminal to have the OS infer a file type from known headers:
$ file webCal
webCal: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, not stripped
File header bytes. Some common file headers are listed in the following table:

File Bytes Translation
dll 4D-5A-90-00 MZ
exe 4D-5A-90-00 MZ
gz 1F-8B-08-08 --
jpg/jpeg FF-D8-FF --
Linux-exe 7F-45-4C-46 ELF
png 89-50-4E-47 PNG
mp3 49-44-33-2E ID3
PPT D0-CF-11-E0 --
PDF 25-50-44-46 %PDF
tgz 1F-9D-90-70 --
Word D0-CF-11-E0 --
wmv 30-26-B2-75 --
zip 50-4B-03-04 PK

File Access

Files may be accessed in one of two modes:

File Attributes

Files have file attributes, or metadata, associated with them. A few examples are in the following listing:

Some possible file attributes.

File Operations

Here are some of the more common operations the OS designer might want to implement on a file system: Files are normally accessed by a file descriptor, an integer returned by the OS to a process when a file is opened. The following example program from the text shows how to manipulate files using common library calls (open, creat, read, write) and file descriptors. Code
/* filecopy.c */
/* File copy program. Error checking is minimal. */
#include <sys/types.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[]);   /* ANSI prototype */

#define BUF_SIZE 4096        /* use a buffer size of 4096 bytes */
#define OUTPUT_MODE 0700       /* protection bits for output file */

int main(int argc, char *argv[])  {

  int in_fd, out_fd, rd_count, wt_count;
  char buffer[BUF_SIZE];

  if (argc != 3) exit(1);

  /* Open input file,  create output file */
  in_fd = open(argv[1], O_RDONLY);
  if (in_fd < 0) exit(2);
  out_fd = creat(argv[2], OUTPUT_MODE);
  if (out_fd < 0) exit(3);

  /* Copy loop */
  while (1) {
    rd_count = read(in_fd, buffer, BUF_SIZE);     /* read a block of data */
    if (rd_count <= 0) break;
    wt_count = write(out_fd, buffer, rd_count);   /* write data */
    if (wt_count <= 0) exit(4);
  }

  /* Close the files */
  close(in_fd);
  close(out_fd);

  if (rd_count == 0) exit(0);     /* no error on last read */
  else exit(5);             /* error on last read */
}

Directories

Single Level

The simplest directory system has only a single root folder, and no sub-folders. Some early systems used these for ease of implementation.
A single level file system.

Hierarchical Directory Systems

Systems with many users or a large number of files require a hierarchical directory structure, or tree. Typically, each user is assigned a private subtree.
A hierarchical file system.

Path Names

A pathname is a unique name for any object in the file system, derived using its path from the root of the tree. Most file systems suport two types of path names. Most common file systems support the following special notation:

Directory Operations

The following are commonly implemented directory operations:

File System Implementation

Now we shift from the user point of view to some typical implementation strategies.

File System Layout

Modern file systems on BIOS-based computers use a structure called the Master Boot Record, or MBR. It's used to boot the computer. The end of the MBR contains the Partition Table for the disk, which describes defined areas of the disk where filesystems may be. The program reading the MBR looks for a bootable partition, which has an executable Boot Block at its beginning. The Boot Block program loads the operating system.

Modern file systems on UEFI-based computers (Unified Extensible Firmware Interface) and some BIOS-based systems use a structure called the GUID Partition Table (GPT). The GPT overcomes some of the earlier limitations of the MBR, most notably 32-bit block addresses, which usually meant no partitions greater then 2TB.

Beyond the MBR or GPT, file systems vary quite a bit in their implementation. One other commonly-used feature, though, is some sort of superblock, which contains all the file system's key parameters, and gets loaded into RAM by the OS.
A possible file system layout. (Tanenbaum)

Implementing Files

Contiguous Allocation

One way to allocate disk blocks for files is in contiguous blocks. Each block on a disk is usually relatively small, somewhere between 512 bytes and 4KB. Contiguous allocation is illustrated in the following diagram:

(a) Contiguous disk allocation for seven files. (b) The state of the disk after files D and F have been removed. (Tanenbaum)
Contiguous allocation has two main strengths: However, over time, the disk will become fragmented (external fragmentation), a very serious limitation, since compaction of a disk is very slow and cumbersome.

Linked List Allocation

The second method is to allocate disk blocks in groups, chained together to form a list. Every disk block can be used, but random access is much slower. Without any auxiliary data structures, the OS might have to walk through all n blocks of a file, if referencing data near the end of it. Also, embedding the next-block pointer inside the data of the last disk block means the data written to disk is not always an even power of 2.
Storing a file as a linked list of disk blocks. (Tanenbaum)

Linked List Allocation with File Allocation Table

Both of the limitations of a pure Linked-List strategy are alleviated by taking the pointer word from each disk block and creating a table in memory. This table is called a File Allocation Table. The table for the previous example would look like this:
Linked list allocation using a file allocation table in main memory. (Tanenbaum)
Although the chain must still be followed to find a disk block, the table is loaded into memory, so walking the list is very fast. The primary limitation of this method is it does not scale well, since its size is proportional to the total number of disk blocks. If the File Allocation Table for a 1TB disk were held in memory, for example, it could take up several gigabytes of RAM. This method is used by the FAT file system, which is still widely supported.

I-nodes

A better approach to managing files on disk is using a structure called an i-node (index node). The i-node lists the attributes and addresses of a file's disk blocks. The big advantage is that the i-node need only be in memory when the corresponding file is open, which means that the memory used is now proportional to the number of files that are open rather than the total number of disk blocks. In essence, the FAT system requires the whole disk directory to be in memory all the time, but the i-node system only requires that information about currently open files be in RAM. An i-node contains file attributes and a list of the addresses of disk blocks for a file. If an i-node fills up with disk block listings, the last listing is used as a chain pointer to another table.
An example i-node. (Tanenbaum)
File systems used on UNIX, Linux, MacOS, and Windows (e.g., ext3, ext4, NTFS, ReFS, APFS) use the i-node or a similar structure.

Directory Entries and I-Nodes

Directory Entries (File Names and Attributes). Most modern systems support long file names. There are three basic approaches to handling them. The second and third strategies are illustrated in the figure.
(a) A directory containing inline, variable-sizes entries, with file attributes.
(b) A directory with fixed-size attributes (top) and pointers to variable-sized attributes (bottom). (Tanenbaum)
In each of these strategies, the directory structure must be searched linearly for a file. There are two options for speeding things up: Storing Attributes in I-Nodes. When the OS needs to locate a file by its path name, it ultimately needs to look up the file's location on disk. For linked-list allocation, that's the number of the first node. For i-node allocation, the OS needs the i-node number.

Each file has attributes, and the attribute info has to be stored somewhere. It could be stored in the directory entry for that file or, in i-node systems, stored in the file's i-node.
(a) A directory containing fixed-size entries, with file attributes.
(b) A directory in which each entry just refers to an i-node. (Tanenbaum)

Shared Files

Allowing files to be shared, rather than duplicated, is desirable for convenience and efficiency. When a file is shared between two users, each has a pointer to it.

Links. File and Directory links come in two varieties: An important advantage of soft links is that they can cross disk boundaries, whereas hard links cannot.

You can experiment with creating hard and soft links at the terminal:
$ echo hello > hello.txt
$ ln hello.txt hardlink1
$ ln hello.txt hardlink2
$ ls -l
total 12
-rw-rw-r-- 3 bilzor bilzor 6 Apr  6 10:38 hardlink1
-rw-rw-r-- 3 bilzor bilzor 6 Apr  6 10:38 hardlink2
-rw-rw-r-- 3 bilzor bilzor 6 Apr  6 10:38 hello.txt
$ ln -s hello.txt softlink
$ ls -l
total 12
-rw-rw-r-- 3 bilzor bilzor 6 Apr  6 10:38 hardlink1
-rw-rw-r-- 3 bilzor bilzor 6 Apr  6 10:38 hardlink2
-rw-rw-r-- 3 bilzor bilzor 6 Apr  6 10:38 hello.txt
lrwxrwxrwx 1 bilzor bilzor 9 Apr  6 10:39 softlink -> hello.txt
Directories and Links. If disk blocks are stored in directories, a problem can occur. If user B and C share a file, they may have independent directory listings for it. If C appends to the file, then B's directory entry becomes out of date. There are two solutions to the problem:
File system containing a file shared by User B and User C. (Tanenbaum)


Link Counts. Even when using a flexible structure like the i-node, there is still a management challenge with files shared by hard link. The i-node contains an entry called the link count, which indicates how many directory entries point to the file. In the diagram below, if C creates a file and B shares it, the link count is 2. However, if C deletes the file, the OS must retain the i-node and decrement the link count, so the file is still usable by B.

(a) C creates a file. (b) B shares the file.
(c) C "deletes" the file, but only C's directory entry is actually removed. (Tanenbaum)

Log-Structured File Systems

Log structured file systems (LFS) are not widely used, but are an interesting variation on traditional systems. Log file systems are based on the idea that frequent, small, write operations do not efficiently use the disk, particularly rotational disks that spend a lot of time waiting through seek and rotational (latency) delays.

An LFS, recognizing the heavy use of RAM to cache files, tries to maximize disk utilization by batching a large number of these small write operations into segments. Segments are written to the disk in the manner of a log, instead of using a traditional structure. If the segments are about 1MB, disk utilization is made much more efficient. However, there are now i-nodes scattered all over the disk. The Tanenbaum text describes in detail how to track these, and how the segments on disk can be kept up to date. However, LFS is not generally compatible with all modern disks, and therefore is not widely used.

Journaling File Systems

However, one key idea from LFS has carried forward, into what are called journaling file systems. The idea is that, by keeping a record of disk transactions, the disk can be more resilient against an interruption in operations. The file system creates a log of what it's going to do before it does it. So, if the system crashes before the operations can be completed, the file system can recover by completing the outstanding operations in the log.

Example. Consider the three actions needed when removing a file: If a crash occurs when only one or two of these operations has been completed, the file system is in an incorrect/inconsistent state. A journaling file system writes all three operations to the log, completes them, then erases the log entries. In the event of a system crash, all the actions associated with an operation are re-run.

For a journaling system to work properly, all the actions to be taken must be idempotent, which means they can be repeated as often as necessary without harm. The data structures and loggable operations of a journaling file system all have to be idempotent.

Journaling file systems are in wide use today, including NTFS/ReFS (Windows), APFS (MacOS/iOS), and ext4 (Linux).

Virtual File Systems

An important step in the evolution of file systems is the virtual file system, or VFS. Most operating systems support a wide variety of filesystem types. The idea behind a VFS is to take as much of the code and operations that are common to all file systems, and place that in its own layer. Operations that are specific to one file system or the other are pushed down. This is an abstraction.
The virtual file system (VFS) abstraction. (Tanenbaum)
All system calls relating to files (open, read, write, lseek) are passed to the VFS for initial processing. Designers of particular file systems (FS1, FS2, FS3 in the diagram) need to supply function calls that can be called by the VFS.

A virtual file system can be physically mounted on the local machine, or over the network. The Network File System (NFS) was actually part of the original motivation for the VFS idea.

A further extension of the VFS idea, available in some UNIX systems, is Filesystem in Userspace (FUSE). FUSE is a software interface that allows ordinary users to create their own file systems without modifying kernel code. The filesystem code runs in userspace, while the FUSE module acts as a bridge to the VFS.

Summary