IC221: Systems Programming (SP18)


Home Policy Calendar Units Assignments Resources

Lab 06: Makefiles and File Statistic System Calls

Table of Contents

Preliminaries

In this lab you will complete C programs that use O.S. system calls, focusing on file system and device management aspects of the operating system. There are four tasks, and you will likely complete the first 3 tasks in lab, finishing the remaining task outside of lab.

Lab Learning Goals

In this lab, you will learn the following topics and practice C programming skills.

  1. Writing simple Makefiles
  2. Learning about file statistics
  3. File statistics: fstat() system call
  4. Altering file modes: chmod() system call
  5. Printing file information: getpwuid(), getgrgid(), strmode() library functions
  6. Altering file modification times: utimes(), getimeofday() system calls
  7. Printing System Call Errors: perror() and errno

Lab Setup

Run the following command

~aviv/bin/ic221-up

Change into the lab directory

cd ~/ic221/lab/06

All the material you need to complete the lab can be found in the lab directory. All material you will submit, you should be place within the lab directory. Throughout this lab, we refer to the lab directory, which you should interpret as the above path.

Submission Folder

For this lab, all scripts for submission should be placed in the following folder:

~/ic221/lab/06

This directory contains 5 sub-directories; examples, makefile, mycp, myls, and mytouch. In the examples directory you will find any source code in this lab document. All lab work should be done in the remaining directories directory.

  • Only source files found in the folder will be graded.
  • Do not change the names of any source files

Finally, in the top level of the lab directory, you will find a README file. You must complete the README file, and include any additional details that might be needed to complete this lab.

Compiling your programs with gcc and make

You are required to provide your own Makefiles for this lab. Each of the source folders, mycp, myls, and mytouch, must have a Makefile. We should be able to compile your programs by typing make in each source directory.

README

In the top level of the lab directory, you will find a README file. You must fill out the README file with your name and alpha. Please include a short summary of each of the tasks and any other information you want to provide to the instructor.

Testing

You are provided a test script to test your submission. It is found in the base of the lab directory: test.sh.


Part 1: Makefiles

In the last lab, you were provided with a Makefile, but for this lab you are required to submit your own Makefiles. All subfolders for this submission must have their own Makefile. The only thing required to compile your programs is for the user (and grader, i.e., your instructor) to simply type make in that directory.

A Makefile is a small program that describes a compilation process. There are three main elements of a Makefile:

  • Targets: This is the goal of a compilation process, such as an executable or object file
  • Dependencies: Files which the target depends on, such as the source files
  • Commands: What should be run to actually compile a file to produce a target.

Once the Makefile is in place, in that directory, you run the make command which looks for a Makefile and attempts to make a specific target.

Let's look at a very simple example: Here's how we would use a makefile to compile a helloworld.c program.

#helloworld/Makefile
  all: hellworld

  helloworld: helloworld.c
          gcc helloworld.c -o helloworld

  clean:
          rm -f helloworld

All targets are set to the left and designated with the ":", so the targets in this Makefile are all, helloworld, and clean. Dependencies are found to the right of the targets. For example, the all target depends on generating the helloworld target, which in turn, depends on the helloworld.c source file. Finally, commands are on lines below target and must be tabbed in using the Tab key (very important).

Reading the Makefile, the key thing is to follow the targets through their dependencies to the commands needed to do the execution. For example, when we type make, the all target is executed by default. The all target dependents on producing the helloworld target, which depends on helloworld.c. Now, the file helloworld.c is not a target, it's a file, and by listing it as a dependency, we are saying "this target is not met whenever the file changes," like when we edit the source code. Assuming the helloworld.c source had changed, thus the helloworld target is not met, then the command is executed, which (re)compiles helloworld.c to produce the helloworld executable.

The last target, clean, does not have any dependencies. Instead, it just has the shell command to remove the executable. It is good practice to have a clean target in your Makefiles. You will often need to clean up the source by removing extraneous files, and the Makefile is a fast and convenient way to do this.

When we use the Makefile, we can just type make, which will compile all the targets associate with the all target. Or, we can type make target, which will just execute the commands to reach the given target. For example, to execute the clean target, we type make clean

Task 1

Change into the makefile/simple directory. In there you will find a program called, compileme.c.

  1. Write a Makefile that will compile compileme.c by typing make and also will clean up any stray executables by typing make clean.
  2. Test your makefile by typing make, and then executing the program. What is the output? Type make again after executing the program, what happened?
  3. To test your makefile dependencies, add an additional format print to the compileme.c source, and type make again. If compileme.c recompiles, you've done this right.
  4. Finally, add some options to the compilation so that you compile compileme.c with the debug flag (-g) and the warning all flag (-Wall), which is always a good thing to do.

You will submit your Makefile for grading.

Multipart Compilation

One of the advantages of C is that you can stage your compilation process. You did this already in the last lab when we had to compile simplefs into an object file filesystem.o and then compile that object file with other source, like the shell or testfile.

Let's review the compilation process. When we have source broken across multiple file, we first have to compile those files to object code, an intermediate compilation stage.

gcc -c source.c -o source.o

Next we can compile multiple object files to assemble an executable.

gcc source.o main.o -o executable

If you look at the above compilation command, you can see the target and dependancies. The target is the executable, executable, the three dependancies are the object files, source.o, and main.o, which each have dependencies, the associate source (and header) files. Let's translate that into a Makefile.

all: executable

executable: source.o main.o
        gcc source.o main.o -o executable

source.o: source.c source.h
        gcc -c source.c -o source.o

main.o: main.c
        gcc -c main.c -o main.o

Tracing the dependencies and the commands starting with all, we can see that to reach the compilation command for executable, first source.c and main.c must be compiled to object files, source.o and main.o. You will also notice the header file, source.h, is listed as a dependency for source.o, which is common so that recompilation will occur whenever the header file changes.

Task 2

Change into the makefile/multi directory. In there you will find four source files and two header file, two of the source files have main() function.

  1. Write a Makefile to compile the binary executable called runme. You will need to also compile the dependencies, and inspect the source file and the associated headers to determine what that might be.
  2. Add another target to the all target so that it now compiles two executables, runme and runme_too.
  3. Include a clean target to remove all object files, those that end in .o, and executables, e.g., runme and runme_too.

You will submit this Makefile for grading.

Part 2: Retrieving and Altering File Statistics

In this part of the lab, we will use system calls to read and write files, retrieve file stats, and modify file properties. This will be done in three tasks. First you will reimplement your copy command line tool, mycp, but this time you will use system calls to copy the file with buffered I/O and preserve the permission mode. Next, you will implement a ls like command line tool, myls, which will list the contents of the current directory. And finally, you will implement a touch-like tool that will update the current modification time of file.

All of these tasks, while individually do not require a lot of code, will expose you to the variety of system calls that support both device I/O (read(), write(), open(), close()) and file management (fstat(), chmod(), utimes(), getimeofday()). You will also learn about some library tools that allow you to interpret file properties in a human readable way (strmode(), getpwuid(), getgrgid()). Finally, you will learn how to easily check and report errors for system calls via the error number reporting interface (errno, perror())

stat()

The operating system maintains file information for each file on the system. You can retrieve this information with the stat() and fstat() system call, as follows:

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int stat(const char *path, struct stat *buf);
int fstat(int fd, struct stat *buf);

The stat() system call takes a path to a file and a pointer to a struct stat. It will then set the value of the struct stat pointed to by buf with the file statistics. The fstat() system call does the same, but takes an open file descriptor rather than a path.

The struct stat, which is defined in the man pages, has the following fields, and a longer description of each is provided in the man pages.

struct stat {
  dev_t     st_dev;     /* ID of device containing file */
  ino_t     st_ino;     /* inode number */
  mode_t    st_mode;    /* protection */
  nlink_t   st_nlink;   /* number of hard links */
  uid_t     st_uid;     /* user ID of owner */
  gid_t     st_gid;     /* group ID of owner */
  dev_t     st_rdev;    /* device ID (if special file) */
  off_t     st_size;    /* total size, in bytes */
  blksize_t st_blksize; /* blocksize for file system I/O */
  blkcnt_t  st_blocks;  /* number of 512B blocks allocated */
  time_t    st_atime;   /* time of last access */
  time_t    st_mtime;   /* time of last modification */
  time_t    st_ctime;   /* time of last status change */
};

Of particular relevance to this lab is the st_mode and st_mtime fields. The former, st_mode, is the mode for the file which defines the permissions of the file, who can read/write/exec the file, as well as the disposition of the file, such as directory of file status. There are a number of macros define in the sys/stat.h header file for interpreting the mode of the file. For example, here is a small program to test the provided path is a directory.

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char * argv[]){

  struct stat st;

  if( argc < 2){
    fprintf(stderr, "ERROR: Require a path\n");
    return 2;   //return status error                                                                                                                                        
  }


  if( stat(argv[1], &st) < 0){     //error, cannot stat file                                                                                                                 
    perror(argv[0]);               //report erro with perror                                                                                                                 
    return 2;                     //return status error                                                                                                                      
  }

  if ( S_ISDIR(st.st_mode) ){
    printf("It's a directory!\n");
    return 0;  //return status true                                                                                                                                          
  }

  printf("Not a directory :(\n");
  return 1; //return status false                                                                                                                                            

}

chmod()

The mode of a file, since it is maintained by the operating system, can only be changed via a system call. The system call to do that is chmod(), which is orally familiar to the command line tool chmod you've already used. To view the man page, be sure to look in section 2 of the manual:

#> man 2 chmod

Here are the two forms, one taking a file descriptor and the other taking a file path, just like stat()

#include <sys/stat.h>

int chmod(const char *path, mode_t mode);
int fchmod(int fd, mode_t mode);

The mode_t mode argument is the same type as the st_mode from the stat() output, and is defined using an ORing like file creation. Here are the relevant constants:

S_IRUSR  (00400)  read by owner

S_IWUSR  (00200)  write by owner

S_IXUSR  (00100)  execute/search by owner ("search" applies for directories, and means that entries within the directory can be accessed)

S_IRGRP  (00040)  read by group

S_IWGRP  (00020)  write by group

S_IXGRP  (00010)  execute/search by group

S_IROTH  (00004)  read by others

S_IWOTH  (00002)  write by others

S_IXOTH  (00001)  execute/search by others

So to set the mode of a file to read/write own, and read group:

chmod( "path/to/file", S_IRUSR | S_IWUSR | S_IRGRP);

But, as you can see from the constants, they are also defined as octets, like how we use chmod on the command line, and the following is equivalent to the above:

chmod( "path/to/file", 0640);

It's a octet, so the leading 0 is important and tells C that this number should be interpreted in octal.

Error checking system calls

All system calls have the same general function prototype for their return value. They always return an integer: On success, 0 is return, and on failure, a negative value is returned. This means we can always check for system call errors using the same pattern:

if( stat(argv[1], &st) < 0){     //error, cannot stat file
  perror(argv[0]);               //report error with perror
  return 2;                      //return status error
}

Simply place the system call in an if statement and check that the return value is less then 0. If so, we want to report that error. We can then exit the program, if that is the appropriate action to take; it isn't always, depending on the task.

Due to the simplicity of the return value, the actual cause of the error is not reported via the return value. Instead, there exists a global variable errno which is set to the value of the error. For example, here are the possible values of errno for a fairly of stat() (note these #define'ed constants):

EACCES Search permission is denied for one of the directories in the path prefix of path.  (See also path_resolution(7).)

EBADF  fd is bad.

EFAULT Bad address.

ELOOP  Too many symbolic links encountered while traversing the path.

ENAMETOOLONG
       path is too long.

ENOENT A component of path does not exist, or path is an empty string.

ENOMEM Out of memory (i.e., kernel memory).

ENOTDIR
       A component of the path prefix of path is not a directory.

EOVERFLOW
       (stat()) path refers to a file whose size cannot be represented in the type off_t.  This can occur when  an  application  compiled  on  a  32-bit  platform  without  -D_FILE_OFF‐
       SET_BITS=64 calls stat() on a file whose size exceeds (1<<31)-1 bits.

The most likely error to occur is EACCES, cannot access the file at the path, or ENOENT, the file or directory doesn't exist. It's good practice to report the precise error to the user so that the error can be corrected, but it's a huge pain to have to type these error messages yourself into every program. Instead, the C standard library has a built in error reporting tool: perror() or print error. It will automatically check the value of the errno and print an appropriate message. Here is a sample output of isdir with a bad path.

#> ./isdir bad/path
./isdir: No such file or directory

Notice, that it prints useful information. I passed to perror() the name of the program, argv[0], so that perror() will output the name of the program in addition to printing the error. This way it looks like a real command line tool.

Task 3: mycp

Change into the mycp directory. In there you will find skeleton code for the start of the mycp command. The usage of the mycp command is as follows

#> mycp source dest

Your mycp must be able to complete the following tasks.

  1. You must use buffered I/O to complete the copy. That means using read() and write() system calls and opening the source a destination file with open() and close(). The buffer size should be 4096, which is the optimal buffer size for fast writes. Check out APUE for some sample code.
  2. It should be able to copy a source file to a destination file, preserving the mode of the file. You can directly use the st_mode from the stat() of the source file and use that as the argument to chmod() of the destination file:

    fchmod(dest_fd, src_stat.st_mode);
    

    Here is some sampel output:

    #> ls -l
    total 12
    drwx--x--x 2 aviv scs 4096 Feb  4 11:17 sub
    -rw-r--r-- 1 aviv scs 9022 Feb  1 17:07 test.txt
    #> ../mycp test.txt test_cp.txt 
    #> ls -l
    total 28
    drwx--x--x 2 aviv scs 4096 Feb  4 11:17 sub
    -rw-r--r-- 1 aviv scs 9022 Feb  4 11:15 test_cp.txt
    -rw-r--r-- 1 aviv scs 9022 Feb  1 17:07 test.txt
    #> diff test_cp.txt test.txt
    #> ../mycp sub/ sub_cp
    ../mycp: sub/: Is a directory
    
  3. If the destination file already exists, mycp should truncate the file and overwrite it with the source file, like cp does.
  4. If the source file is a directory, you should exit with the error message based on the executable name and the src directory using. Use the macro S_ISDIR() for that check given the st_mode of the source file. For example:

    if( S_ISDIR(fs.st_mode) ){
      fprintf(stderr, "%s: %s: Is a directory\n", argv[0], argv[1]);
      return 1;
    }
    
  5. The error conditions of all system calls should be checked and will greatly help your debugging. Any errors should be reported. Use perror() liberally.

Part 3: Human readable forms of the stat fields

We want to continue exploring the output of the stat() command, investigating the other fields of the struct stat data type. Unfortunately, we aren't a computer, and we'd like to view these fields in a human readable way. Of particular relevance are the following fields:

mode_t    st_mode;    /* protection */
uid_t     st_uid;     /* user ID of owner */
gid_t     st_gid;     /* group ID of owner */

strmode()

We've already discussed the st_mode field, which stores the various dispositions of the file. It's just a number really, but we'd like to view that information in a human readable, which can be done with the library function strmode(), which will convert a mode_t into a -rwxrwxrwx string, just like in ls. Here is an example program:

/*examples/printmode.c*/
#include <stdio.h>
#include <stdlib.h>
#include <bsd/string.h>

int main(int argc, char * argv[]){

  char smode[12]; //mode strings are always 11 chars long
                  // +1 for the NULL, makes 12!

  strmode(0644, smode);
  printf("0644 : %s\n", smode);

  strmode(0742, smode);
  printf("0742 : %s\n", smode);

}

To compile a program that calls strmode() on a linux system, which you will be, you need to use the bsd library. Add the -lbsd option to gcc to link the library to your executable. Here is the sample compilation from the Makefile for printmode:

printmode: printmode.c
        gcc printmode.c -o printmode -lbsd

pwgetuid()

Each file has an owner, a user, and a group. As humans, we like to refer to these values as strings and not numbers. The owner is aviv and the group is scs, for example. The operating system doesn't think like a human, and instead stores these values a numbers. Each user has a uid and can be member of any number of groups, identified by a number gid. Similarly, files also have an associated user and group for ownership purposes, st_uid and st_gid in the struct stat.

To convert these numbers to human readable formats, we could look in the /etc/passwd and /etc/group files like we did when programming bash, but that's way, way too much work. Fortunately, C provides two library functions to do that conversion for us.

Let's start with retrieving the username. We first need to look up the password file entry for that user. We use getpwuid() for that.

#include <sys/types.h>
#include <pwd.h>

struct passwd *getpwuid(uid_t uid);

Provided the uid, which we have from the stat() output, getpwuid() returns a pointer to a struct passwd data type. It has the following fields:

struct passwd {
  char   *pw_name;       /* username */
  char   *pw_passwd;     /* user password */
  uid_t   pw_uid;        /* user ID */
  gid_t   pw_gid;        /* group ID */
  char   *pw_gecos;      /* user information */
  char   *pw_dir;        /* home directory */
  char   *pw_shell;      /* shell program */
};

The relevant field for this lab is pw_name, which is a string referencing the user name. Here is a sample program to print a username. My uid on the linux lab is 35001. (BTW, I know what you're thinking, "oohh, password!" But, the pw_passwd field is the encrypted password not the plaint text password. Nice try, though, and even then, the actual encrypted password is stored in a different file called /etc/shadow.)

/*examples/printusername.c*/
#include <sys/types.h>
#include <pwd.h>
#include <stdio.h>
#include <stdlib.h>


int main(int argc, char * argv[]){

  uid_t my_uid = 35001;
  struct passwd * pwd;

  pwd = getpwuid(my_uid);

  printf("My username is: %s\n", pwd->pw_name);

  return 0;
}

getgrgid()

Retrieving the string version of the gid uses a similar process to that of retrieving the username. We use the getgrgid() library function:

#include <sys/types.h>
#include <grp.h>

struct group *getgrgid(gid_t gid);

Given a gid, the getgrid() function returns a pointer to a struct group, which has the following fields:

struct group {
    char   *gr_name;       /* group name */
    char   *gr_passwd;     /* group password */
    gid_t   gr_gid;        /* group ID */
    char  **gr_mem;        /* group members */
};

The relevant field is gr_name, which is the human readable name of the group. Here is a sample program to print group name for the SCS group, which is group id 10120:

#include <sys/types.h>
#include <grp.h>
#include <stdio.h>
#include <stdlib.h>


int main(int argc, char * argv[]){

  uid_t my_gid = 10120;
  struct group * grp;

  grp = getgrgid(my_gid);

  printf("My groupname is: %s\n", grp->gr_name);

  return 0;
}

Task 4: myls

Change into the myls directory. In there you will find skeleton code for the start of the myls command. The usage of the myls command is as follows:

#> myls

it takes no arguments, and will only list the contents of the current directory. Actually iterating through a contents of a directory is beyond the scope of this lab, and code for doing that is provided for you. Upon each iteration of the while loop, the entry structure will reference a different file/dir. You can retrieve the name of that file/dir with entry->d_name. Read the comments for more details.

Your myls must be able to complete the following tasks.

  1. It should list all the contents of the current working directory, from which myls is run. The code for iterating through the current directory is provided for you, so the task is parsing the stat() structures when called on each of the files/directories therein.
  2. The myls program should do a long list, like ls -l, which outputs the permission modes, name of file, username of the owner, groupname of the file, the size of the file, and the last modification time (st_mtime). You can use ctime() to print the time. Each item must be separated with tabs, i.e., "\t". Sample output below, run from the test_dir in the myls directory.

    #> ../myls 
    -rw-------  rand	aviv	scs	7331	Tue Feb  4 09:32:34 2015
    -rw-r--r--  a.txt	aviv	scs	38	Tue Feb  4 09:34:26 2015
    drwx--x--x  .	aviv	scs	4096	Tue Feb  4 09:34:45 2015
    drwx--x--x  ..	aviv	scs	4096	Tue Feb  4 09:33:21 2015
    drwx--x--x  subdir	aviv	scs	4096	Tue Feb  4 09:32:59 2015
    -rw-rw----  empty.txt	aviv	scs	0	Tue Feb  4 09:32:07 2015
    

    You're output may look different, e.g., different user names, group names, and time values, but that is fine and to be expected. Also, don't worry about misalignment. As long as the output is tab separated, you're good. Here's how to use ctime() again:

    ctime(&(st->st_mtime)); //returns a reference to a string lik "Tue Feb  4 09:34:45 2015\n"
                            //Note it has a newline for free.
    
  3. You should check all error conditions from system calls and alike, and exit on error reporting useful information. Use perror() liberally.

Part 4: Modifying File Access Times

The last part of the stat() output that we want to interpret and manipulate is the creation, access, and modification time. The relevant fields of the sruct stat are below:

time_t    st_atime;   /* time of last access */
time_t    st_mtime;   /* time of last modification */
time_t    st_ctime;   /* time of last status change */

As usual, these time values are just large numbers, long's, which counts the number of seconds since the epoch, Jan. 1st 1970. We are not allowed to alter the creation time, those are managed automatically by the operating system, but we know that we can alter the modification and access time. We've done this already with the touch command, and here is an associated system call that can alter the modification time, like touch.

utimes()

The utimes() system call changes a files last access and modification time. Here is the prototype from the man page:

#include <sys/types.h>
#include <sys/time.h>

int utimes(const char *filename, const struct timeval times[2]);

It takes an argument filename, which is the path to the file, and a array of struct timeval. The size of the array is 2, and times[0] is the new access time and times[1] is the new modification time. Note, that struct timeval is a different time that the time_t data types we've been using for managing time stamps.

getimeofday()

A struct timeval has the following fields:

struct timeval {
    long tv_sec;        /* seconds */
    long tv_usec;       /* microseconds */
};

The tv_sec is like a time_t, seconds since the epoch, and the timeval offers even finer precision. The tv_usec is the additional microsecond calculation. To get the current tiemval from the system clock, you use the gettimeofday() system call.

#include <sys/time.h>

int gettimeofday(struct timeval *tv, struct timezone *tz);

The gettimeofday() system call takes a pointer to a timeval and a timezone. It will set the current time at the memory referenced by those pointers. We don't really care about the timezone, so we'll call gettimeofday() like this, generally:

struct timeval tv;
gettimeofday(&tv, NULL);

Putting it all together, here is a sample program to print the current time using ctime() and getimeofday():

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <time.h>

int main(int argc, char * argv[]){

  struct timeval tv;

  gettimeofday(&tv,NULL);

  //ctime takes a pointer to the seconds since epoch
  printf("%s", ctime(&(tv.tv_sec)));

}

Task 5: mytouch

Change into the mytouch directory. In there you will find skeleton code for the start of the mytouch.c code. The usage of mytouch is as follows:

#> mytouch path

Where path is the file path to the file to be touched. Your mytouch must be able to complete the following tasks:

  1. If a file exists, it should update the modification of the file using utimes() and getitmeofday().
  2. It should output the modification prior to the call to utimes() and after using the ctime(). Sample output is below:

    #> ./mytouch mytouch.c 
    Last Modified: Tue Feb  4 08:59:44 2014
    New Modified: Tue Feb  4 12:41:11 2014
    
  3. If the file does not exist, an error should be reported rather than creating a new file.
  4. All errors for system calls should be checked.
  5. (5 pts EXTRA CREDIT): If the file does not exist, create it if possible, and report an error if not possible.