egrep utility,
but keep in mind that regular expressions are used everywhere:
they're the backbone of perl programming, for example, and a
principal componant of any tool that helps build intereters or
compilers.
The egrep utility takes a regular expression and a
file and prints out all lines in the file that contain a string
that can be generated by the regular expression. So, from the
command line you type:
egrep
'regexp'
filename
The regular expression regexp, which we'll discuss in a
second, is usually surrounded by ' ' to keep
your shell from interpreting and modifying anything. If you are
interested in a more complete (though possibly incomprehensible)
description of regular expressions in Unix, you can type
man -s5 regexp at the command prompt.
FILE: data
|
COMMAND AND RESULT |
therainin spainfallsmainly ontheplain |
[~/]> egrep 'fall' data spainfallsmainly [~/]> |
What's going on here? Well, the regular expression is
fall (i.e. f concatenated with a concatenated with
...) and egrep printed out the only line that
contained an occurrence of that regular expression. Of course,
matching a literal string like this is not too interesting,
though often quite powerful.
fall or
on? In regular expressions, the | represents choice:
fall|on
Be carefull not to put in white space, bcause it matters.
I'll say it again: whitespace matters!!!!
FILE: data
|
COMMAND AND RESULT | WHITESPACE MISTAKE |
therainin spainfallsmainly ontheplain |
[~/]> egrep 'fall|on' data spainfallsmainly ontheplain [~/]> |
[~/]> egrep 'fall | on' data [~/]> |
FILE: data
|
COMMAND AND RESULT | COMMAND AND RESULT |
{ x in Y | x + 3 < 12 }
{ all even x's }
(Y x Y)
))){{{
|
[~/]> egrep '\|' data
{ x in Y | x + 3 < 12 }
[~/]>
|
[~/]> egrep '\||\(' data
{ x in Y | x + 3 < 12 }
(Y x Y)
[~/]>
|
FILE: data
|
COMMAND AND RESULT | COMMAND AND RESULT |
001100110011 0101010 |
[~/]> egrep '(00|11)*' data 001100110011 010101 [~/]> |
[~/]> egrep '(00|11)+' data 001100110011 [~/]> |
FILE: data
|
COMMAND AND RESULT | COMMAND AND RESULT |
prevent a10 affix bx postorder b10 postal a prefix a00 |
[~/]> egrep '(pre|post)(order|fix)' data postorder b10 prefix a00 [~/]> |
[~/]> egrep '(a|b)(0|1)+' data prevent a10 postorder b10 prefix a00 [~/]> |
egrep to print lines that
completely match a given regular expression rather than lines that
contain some substring that matches the expression. For example,
suppose we want to print out the lines that contain only 0's. We
might type egrep '0+'. Unfortunately, this would
print out a line like 11001101, since it has "00" as
a substring, which matches 0+. However, unix regular expressions
include ^, which matches the beginning of a line, and $, which
matches the end of a line. Thus, egrep '^0+$'
matches only a line of 0ne or more zeros.
FILE: data
|
COMMAND AND RESULT | COMMAND AND RESULT | COMMAND AND RESULT |
0x000000 (1100110) 0000000 1010000 (00) 0011111 |
[~/]> egrep '^0+' data 0x000000 0000000 0011111 [~/]> |
[~/]> egrep '0+$' data 0x000000 0000000 1010000 [~/]> |
[~/]> egrep '^0+$' data 0000000 [~/]> |
egrep's regular expressions.
[0-9]. To match all
uppercase letters you put [A-Z]. To match a
letter (either case) or a digit, you'd use
[0-9]|[A-Z]|[a-z].
? character. For example, if you want to
specify that a + or - sign may be there, but may not,
you'd write (+|-)?.