Regular Expressions and egrep

Go through the regular expression tutorial before you attempt this homework! Also, remember to check the common mistakes if things aren't working out.
BACKGROUND
I want to show you why egrep is such a powerful tool in unix. You'll need to be familiar with two important uniz concepts - besided regular expressions, I mean. You need to know about the wc progra, and abut pipes.

wc is a program that does character, word, and line counts on files. The -l option indicates line count, and that's most useful, I think. For instance: how many lines does the tmpdata file from last homework have?

> wc -l ~wcbrown/tmpdata 
     195 /users/faculty/wcbrown/tmpdata
	
This tells me it has 195 lines.

Piped commands are a powerful part of unix. You "pipe" the output of one command into the input of another command. For example, ls -l prints out all the files in the current directory - one per line.

valiant[202] [~/courses/SI472/classes/C07/]> ls -l
total 8
-rw-r--r--   1 wcbrown  faculty     1679 Sep 12 17:15 Class.html
-rw-r--r--   1 wcbrown  faculty      936 Sep 12 15:52 Class.html~
drwxr-x--x   2 wcbrown  faculty      512 Sep 14 07:46 HW/
	
If I want to know how many files are in a directory, I simply pipe the output of ls -l to the input of wc -l:
valiant[203] [~/courses/SI472/classes/C07/]> ls -l | wc -l
       4
	
A pipe is indicated with a |

Weserver access logs: Logon to csb and type:

	  cd /usr/netscape/server4/https-csb.mathsci.usna.edu/logs
	
to change to the directory in which the departmental access log is kept. The file access is the access log. Type
more access
	
to move through the file (hitting space brings up the next page in the file). This file is HUGE, so do not copy it or try to open it in a text editor. How huge is it?
csb[123] [/usr/netscape/server4/https-csb.mathsci.usna.edu/logs/]> wc -l access
  184312 access
	
It's 184,312 lines long! Here's a line from this file:
valiant.mathsci.usna.edu - - [11/Jul/2000:09:13:10 -0400] "GET /~wcbrown/blank.html HTTP/1.0" 200 514
	
What does this tell you? Well, first of all, the hit came from the machine valiant.mathsci.usna.edu ... that's me. The date for this access was 11/Jul/2000 (the time's in there too), and the page that was accessed was ~wcbrown/blank.html, which is part of my homepage. So on 11 July, I accessed my homepage from the machine valiant.

I might want to know how many accesses there have been to my homepage. Well, any access with a ~wcbrown in it is to my homepage, so there have been

csb[125] [/usr/netscape/server4/https-csb.mathsci.usna.edu/logs/]> egrep '~wcbrown' access | wc -l
   42038
	
... 42,038 access of pages belonging to me.

The first thing on a line is the address (symbolic if possible, otherwise numeric) of the person accessing the site. That address is followed by the string " - - ". Suppose I wanted to find all accesses to our site from russia. Well, thier addresses end in .ru. Unfortunately, a "." matches "any character" in a regexp, so we have to put in "\." (i.e. use an escape sequence) to get a . character:

csb[135] [/usr/netscape/server4/https-csb.mathsci.usna.edu/logs/]> egrep '\.ru - -' access
ppp96-201.dialup.mtu-net.ru - - [18/Jul/2000:20:49:23 -0400] "GET /~coleman/classes/si221/labs/lab2/ HTTP/1.1" 404 319
uxse118.jinr.ru - - [26/Jul/2000:06:16:15 -0400] "GET /~wcbrown/courses/SI420/mylisp/Tutorial.html HTTP/1.0" 200 586
uxse118.jinr.ru - - [26/Jul/2000:06:16:19 -0400] "GET /~wcbrown/courses/SI420/mylisp/Lisp1.html HTTP/1.0" 200 4836
uxse118.jinr.ru - - [26/Jul/2000:06:17:27 -0400] "GET /~wcbrown/courses/SI420/mylisp/Lisp2.html HTTP/1.0" 200 3521
uxse118.jinr.ru - - [26/Jul/2000:06:17:51 -0400] "GET /~wcbrown/courses/SI420/mylisp/Lisp3.html HTTP/1.0" 200 6789
uxse118.jinr.ru - - [26/Jul/2000:06:18:33 -0400] "GET /~wcbrown/courses/SI420/mylisp/Lisp4.html HTTP/1.0" 200 6999
mcc2-pool-239.cell.ru - - [18/Aug/2000:13:47:06 -0400] "GET /~needham/courses/si434/fall99/slides/chap3.htm HTTP/1.0" 200 6337
dima.stu.neva.ru - - [28/Aug/2000:12:11:20 -0400] "GET /~wcbrown/courses/SI433/classes/C12/Class.html HTTP/1.1" 200 3138
dima.stu.neva.ru - - [28/Aug/2000:12:11:48 -0400] "GET /~wcbrown/courses/SI433/classes/C12/LCS.cpp HTTP/1.1" 200 2556

YOUR ASSIGNMENT
Use the access log, egrep, and wc to answer the following questions by a single egrep command (possibly piped to wc):

  1. How many accesses from japan has the departmental website had? (Japan addresses end in jp.)

  2. How many accesses has Dr. Needham's page had from japan?

  3. How many accesses were there to my SI472 homepage on the 27th and 28th of August? Hint: here's an example of such an access:
    M014560.mid4.usna.edu - - [28/Aug/2000:23:34:46 -0400] "GET /~wcbrown/courses/SI472/classes/C02/HW/HW.css HTTP/1.0" 200 892
    
    	
Turn in a printout of the shell interactions that answered these questions!


W C Brown
2000-09-07