more, which prints the contents of the
files you list on the command-line when calling more, one
after the other, a screenfull at a time, so you can page
through everything. In fact, more is a program
like any other, and if you enter "which more"
in the shell, you'll find the exact location of the executable
file that is the program more.
A program is simply an executable file.Now, if we open three terminal windows on our Unix machine, and enter "
more /etc/passwd" into the first, and
"more /usr/include/math.h" into the second,
you'll see that you have two more's executing
simultaneously on the machine. Since there's only one
program more, we need a name for these
executing instances of programs, of which there may be several
of the same program at the same time.
An executing instance of a program is called a process.There is another standard Unix utility
ps that
lists the processes currently executing on a machine. If in the
third terminal window you type
ps -u yourusernameyou'll see all the processes belonging to you, and included in that list ought to be the two
more process we
still have executing in the first and second window.
We'll talk a lot about processes in this course, but for now
it suffices to know that a process is an executing instance of
a program, and that each process has its own
argv.
These four things that every process has: argv, stdin, stdout, stderr are really important. You'll start to see why today, but as the semester rolls along you'll see it more and more.
argv and standard in/out/error are all set up
correctly. When you give the command
$$ rm tmp.txt junk1 junk2the shell starts a new process (usually we say spawn or fork a processs) that is an executing instance of
/bin/rm, setting argv like this:
... tying standard in to the keyboard (which is irrelevant here), standard out to the screen (which is irrelevant here), and standard error to the screen -- which means that if there are errors, like file
argv =
rm
0tmp.txt
1junk1
2junk2
3
tmp.txt not existing,
the error messages get printed on the screen.
The shell waits for that process to finish executing
before prompting the user for another command. When you give
the command
$$ emacs &The shell spawns a new process that is an executing instance of
emacs and, because we followed the command
with an "&", immediately prompts us for another command,
without waiting for the emacs process to
terminate. So, you see, the shell is really just a program
that makes it easy to create new processes.
The shell is just another program and, in fact, though we speak of "the shell", there are many different shells from which to choose:
$ cat /etc/shells ← This file may not exist on your machine
/bin/sh Bourne shell
/bin/csh C shell
/bin/ksh Korn shell
/bin/jsh job control shell
/bin/rsh remote shell
/bin/tcsh TENEX C shell
/bin/bash Bourne Again shell
Each one of these is a different shell (command interpreter)
offering different twists on the basic shell functionality.
We're using
bash. Others may use a different shell. Many of the
faculty use tcsh.
stdin in, output goes to
stdout and error messages go
to stderr. We can see this quite clearly looking
at a simple C++ program --- prog1.cpp --- we can see what parts of the
program correspond to standard in, standard out, standard
error, return codes, etc.
Assuming this program is compiled into something called prog1:bear[~]> prog1 3 2 7 9 ctrl-d ← this simulates "end of file" from the keyboard 5.25Hopefully you see that this program simply returns the average of the numbers it reads in. Notice that it follows the "Unix Philosopy" of small program that does one thing; no prompting for input, no formatting of output, no cute messages.
The Unix Philosophy then continues like this: small tools
are tied together by the user to do whatever job needs
doing. Let's see how this can work. Suppose I want to know
the average size (in bytes) of all the files in the
current directory. A simple ls -l lists all
the files writes (to stdout) all the file names and their sizes
... and a whole bunch of extra information! We only want
the file sizes. The file sizes all appear within the 30th
through 40th characters on each line of the ls
-l output. So we want to cut out characters 30-40 of
each line. There's an app for that, or rather there's a
Unix utility for that, it's called cut. In
particular, cut -c30-40 reads from stdin and
for each line writes (to stdout) only the characters in
positions 30-40. So, to get just the file sizes, we'd like
to run ls -l and have its stdout go
into cut's stdin rather than to the screen.
Unix provides a nice mechanism for doing exactly that:
piping the stdout from one program to the stdin for
another. It's called a "pipe", and the "|" symbol on the
command line denotes a pipe. So
ls -l | cut -c30-40is like issuing two separate command-lines, but they are tied together by piping the stdout of the first to the stdin of the second. This one idea lets users easily combine utilities in an infinite number of ways to solve new problems. Even better, our super-simple-program plays nice in this world!
bear[~]> ls -l
total 606
-rw------- 1 wcbrown scs 21233 Jan 13 2010 Class.html
-rwx------ 1 wcbrown scs 5820 Jan 8 11:43 ev*
-rw------- 1 wcbrown scs 281 Jan 8 11:43 ev.c
-rw------- 1 wcbrown scs 1830 Jan 8 11:43 ev.c.html
-rw------- 1 wcbrown scs 3144 Jan 8 11:43 nflqb2008.txt
-rwx------ 1 wcbrown scs 11052 Jan 12 09:14 prog1*
-rwx------ 1 wcbrown scs 249168 Jan 13 09:07 prog1annotated.png*
-rw------- 1 wcbrown scs 400 Jan 12 09:13 prog1.cpp
-rw------- 1 wcbrown scs 30 Jan 8 11:43 wwww
|
bear[~]> ls -l | cut -c30-40
21233
5820
281
1830
3144
11052
249168
400
30
|
bear[~]> ls -l | cut -c30-40 | prog1
32550.9
|
So, now we know what the average file size is in my directory! It's
important to understand this command line. Normally, what you enter
in to the command line is a request to execute a program
(i.e. create a process). This is actually three such commands all
issued on the same line, with the "|" separating them. Since each
process (as we know) has argv, stdin, stdout, stderr, we can really
describe what the command line "ls -l | cut -c30-40 | prog1"
means like this:
Notice how the regular output of ls is never seen (why?) but error messages are (why?).![]()
bear[~]> ls -z | cut -c30-40 | ./prog1
ls: illegal option -- z
usage: ls -aAbcCdeEfFghHilLmnopqrRsStuxvV1@/[c | v]%[atime | crtime | ctime | mtime | all] [files]
Error! No data entered!
Another crucial part of the Unix Philosophy is that modifying the
behavior of utilities is done with command-line options (the argv
elements) rather than
via stdin. Without this separation -- data comes in via stdin,
behavior modification is done via command-line options -- the whole
concept of combining tools via pipes breaks down. For instance, we
used the command-line option -c30-40
with cut.
If that modification had to be done via stdin we'd be stuck, because
stdin is coming from ls, not the keyboard, and ls isn't
going to write out -c30-40. Moreover, what if it did
write out stuff like that, but we didn't want it interpreted as a
request to change cut's behavior.
wc utility which, with the -l
option, returns the number of lines in a file):
; separate a sequence of commands> cd ; ls
| "pipe" - connects stdout of
one program to stdin of another> cat /etc/passwd | wc -l
< > file redirectionfoo already exists, >
foo overwrites it. If file foo
already exists,
>> foo appends new text to the end of
the existing file.
> cat /etc/passwd > foo.txt > ls foo.txt > wc -l < foo.txt 218
$? gives value "returned" by the last
shell program run.return
statement in main for C/C++ programs), and
$? lets you see what it was for the
previously executed program. Typically, a non-zero return
value is an error indicator.
& run program in the background
echo built in to the shell, copies its arguments to stdout
$ echo foo bar
foo bar
$ echo *
foo foo.c
$ echo "The rain in Spain stays mainly on the plain" > eliza
$ ls
foo foo.c eliza
cat concatenate and display
cat [options] [file1 ...]
$ cat -n foo.c
1 // foo.c
2 int main()
3 {
4 return 80;
5 }
$ cat foo.c eliza
// foo.c
int main()
{
return 80;
}
The rain in Spain stays maaaainly on the plain
$ cat eliza foo.c eliza foo.c eliza > junk
$ ls
foo foo.c eliza junk
head display the first few lines of a file
head [-number | -n number] [file1 ...]
$ head -3 foo.c junk
==> foo.c <==
// foo.c
int main()
{
==> junk <==
The rain in Spain stays maaaainly on the plain
// foo.c
int main()
tail display the last few lines of the file
This is like the reverse of head. So
tail -3 foo.txt ← writes the last few lines of foo.txt
tail +3 foo.txt ← writes every line from the 3rd line to last
tail -r foo.txt ← writes foo.txt in reverse
grep search a file for a pattern
('horizontal' cuts through a file)
(many options!)
$ grep -n Spain junk
1:The rain in Spain stays maaaainly on the plain
7:The rain in Spain stays maaaainly on the plain
13:The rain in Spain stays maaaainly on the plain
grep -v pattern file <= lines that DO NOT match pattern
tr translate characters
tr [options] str1 str2 (loosely: see the man page)
-d delete all occurrences of charaters in str1
-s replace repeated occurrences of characters in str1 with
a single character
$ cat eliza | tr -d a
The rin in Spin stys minly on the plin
$ cat eliza | tr -s a
The rain in Spain stays mainly on the plain
$ cat eliza | tr -s a | tr [a-z] [A-Z] > caps
$ cat caps
THE RAIN IN SPAIN STAYS MAINLY ON THE PLAIN
$ cat caps | tr " " ":" > delim
$ cat delim
THE:RAIN:IN:SPAIN:STAYS:MAINLY:ON:THE:PLAIN
cut cut out selected fields of each line of a file
('vertical' cuts through a file)
cut [options] [file ...]
$ head -1 nflqb2008.txt
$ Name Team G QBRat ... etc
$ grep -v Name nflqb2008.txt | cut -f1 > names
$ grep -v Name nflqb2008.txt | cut -f2 > teams
$ grep -v Name nflqb2008.txt | cut -f4 > ratings
paste merge lines of files
paste [-s] [-d list] file ...
$ paste names teams ratings > stats
sort sort files
$ sort stats
$ sort -k2 stats
$ sort -r -k2 stats
wc counts lines, characters and words in a file or stdin.
$ wc -l nflqb2008.txt
34
$history from the previous section is simply a
shell variable. All variables are referenced this way: the $
prefixed to the variable name gets the variable's value:
e.g. $foo.
Optionally, you can wrap the variable's name in ${
}, e.g. ${foo}. Note: in
programming language circles, the "$" is referred to as a
sigil.
Variables are given values with =, with no spaces allowed on either side of the =.
bear[~]> foo=twain bear[~]> echo foo ← Oops, forgot the $ foo bear[~]> echo $foo twain bear[~]> echo $foobar ← Undefined variable bear[~]>
Every process has a list of environment variables,
the values of which can be read and written by the process.
Normal shell variables are "local" by default, meaning
they're defined in
the current shell session, but will not exist as environment
variables in processes
that are spawned by the shell. However, if you export
a variable, it becomes global --- meaning that the variable
will be an environment variable in the shell and in all
processes spawned by the shell.
bear[~]> foo=twain
bear[~]> export foo ← Can also combine as export foo=twain
There are several very
important environment variables that are defined when you
login, including HOSTNAME, the name of the computer you're on,
and USER, your user name and, most importantly, PATH.
bear[~]> echo $HOSTNAME mich301csdbrownu bear[~]> echo $USER wcbrown bear[~]> echo $PATH /opt/csw/bin:/usr/local/bin:/usr/bin:/bin:.
The PATH variable is the key to how the shell determines
what program to execute when you enter a command line and
don't give a full or relative path the program you want, but
rather just the program's name. For example, when you
enter emacs at the command prompt (assuming the
PATH above) the shell looks for a file emacs is
/opt/csw/bin, doesn't find one, then tries
/usr/local/bin, finds a file
called emacs, then executes it. If I have a
program called foo in the current working
directory (and it appears nowhere else) and I
enter foo at the command prompt, the shell will
execute that foo file, because "." is in
the PATH and it means "current directory". If "."
were not in the path, I'd have to give a path
to foo, so I'd write ./foo
instead. If you want to know where in PATH the shell finds
a certain command, use the which command:
bear[1] [~/]> which emacs
/usr/local/bin/emacs
Putting "." in the PATH is considered a security concern.