This lesson marks a change. In previous lessons, we have talked about a single computer. Now we turn our attention to the World Wide Web, which is a system comprised of many computers.

There are many different web-browsers. In this course you'll need to have Microsoft's Explorer, Mozilla Firefox and Google Chrome at a minimum. Other browsers include Apple's Safari, the Opera browser, and browser variants that are optimized for phones and tablets.
                   

Public webpages for navy ships



Many ships in the U.S. Navy have their own public websites. The URL is http://www.shipname.navy.mil. For example, the USS BAINBRIDGE's website is at http://www.bainbridge.navy.mil, and the USS NIMITZ's website is at http://www.nimitz.navy.mil. When a client requests the web page from one of the URLs above, that request is directed to the web server that hosts (serves) the website. For ships and submarines, hosting a website is not practical because of limited availability, bandwidth considerations, risk of detection, and the increased vulnerability (penetration point into) to the ship's or submarine's internal network. Instead, shore-based commands provide and manage the webserver that hosts ship and submarine websites, much like the Computer Science Department provides and manages the rona webserver that hosts your websites.

As far as who's in charge of a ship's website — i.e. who is responsible for the images, html files, etc. that comprise the site — depending on the command, any officer on board could be placed in charge of the ship's web page. So you may end up being responsible for a site like this.

Browsers and Servers

The Web is an example of a client-server system. It consists of web servers, which are programs /computers with information to provide, and web clients (another name for browsers), which are consumers of information. Many systems actually follow this client server model, the web is just the most familiar one.

A browser's primary control is the address bar. You enter a URL (Uniform Resource Locator) that describes to the browser where to find the item you want (roughly by specifying a web server and a file on that webserver), and the browser contacts the web server and requests the item which, hopefully, the server then sends back. A URL typically specifies three things:

  1. the protocol to use (basically what language the browser and server should use to carry out their transaction),
  2. the name of the webserver to contact, and
  3. a path specifying a file on that webserver.
http://www.usna.edu/Users/cs/wcbrown/index.html
\__/   \__________/\__________________________/
 |          |                 |
protocol    |         path on server's filesystem
         server
	
The "Web" and the "Internet" are not the same thing. The internet is the infrastructure through which browsers and webservers communicate. Many other kinds of communication run over the internet: e-mail, voice-over-IP telephone calls, remote logins to computers, etc.
The server is specified by a domain name — something like www.cnn.com or en.wikipedia.org. We'll talk a bit more about domain names in the networking section of the course. The path is a relative path from some point in the server's filesystem. The gotcha on the path is that it uses Unix path conventions, which means forward slashes (/) instead of back slashes (\), regardless of whether the server is a Windows server or a Unix server. Finally we get to the protocol. Most browsers support several protocols, including: http, https, file, mailto and ftp. Essentially, the world wide web consists of browsers and webservers communicating via the http (hyper-text transfer protocol) protocol. The https protocol is just a "secure" version of http — more on that later.

When you put a URL like http://intranet.usna.edu/1stCo/index.html in your browser's address bar, it initiates the following sequence of actions:

  1. The browser contacts the server intranet.usna.edu and asks it to get the file 1stCo/index.html.
  2. The server retrieves the file 1stCo/index.html and sends it (serves it) to the browser.
  3. The browser receives the file from the server and renders it on screen in your browser window.

Browsers used to (meaning 'til 2010) have a status bar at the bottom of the screen that gave you important information about the status of the browser. That's gone on all major browsers, but there's still a little popup for the status in some circumstances, and it's important. Hover your mouse over this link and look for the popup window with the text http://www.usma.edu. This status popup is telling you the address the browser will go to if you click on this link. There's a little bit of a misdirection trick that knowing about the status popup can help you avoid. Don't click on the following link, but check out where the browser will actually send you if you click on it:

http://www.trustworthy.org
A classic use of the misleading link is Rick Rolling. You Rick Roll someone by tricking them into watching the Rick Astley "Never Gonna Give You Up" video (watch if you dare).

If this kind of misdirection doesn't seem like a big deal, check out this Wired article Anonymous Tricks Bystanders Into Attacking Justice Department. about a January 2012 use of exactly this technique.

Another important visual cue from the browser is that a little lock icon is displayed (by most browsers) when your connection to a server is using the https protocol, which is the secure version of http.

The file protocol

You can open up a file on your computer in the browser using the file protocol. Note that this is not the web! It's not client-server and it doesn't use http/https. Suppose you were user m169999 and you had a file on your Desktop called vacation.jpg. Putting the following URL in the browser's address bar would result in the browser showing you that image:
file:///C:/Users/m169999/Desktop/vacation.jpg
Note that the "server" portion of the URL has collapsed to nothing, which is why there are three /'s in a row, indicating that we're accessing the file on our local machine. Using ctrl+o, you can browse the filesystem to open a file, which may be more convenient than entering a URL. The file protocol is really useful when building websites, since you can get a quick look at a page even before you put it on a webserver.

HTTP (Hyper-Text Transfer Protocol)

At its simplest, HTTP is just a language of requests and responses-to-requests that allow files to be fetched from webservers all over the internet. Your browser uses this language to get the file named in the URL from the server (also named in the URL). In fact, the http command it uses is "GET". The key point is YOUR browser makes a request to a remote server ON YOUR BEHALF ... usually to have a given file sent to it. You can, in fact, send requests to a webserver on your own, i.e. without going through a browser. As with so many things, however, be careful what you ask for! We'll use a tool called netcat (nc) which allows you to send network requests at a low level. Let's compare what you see when you browse to http://intranet.usna.edu/1stCo/index.html with what the browser sees and goes through to bring you that pretty page. What's in red is what we type, what's in green is what the server sends back.
$ nc intranet.usna.edu 80         ← have netcat connect to the webserver intranet.usna.edu
  GET /1stCo/index.html HTTP/1.0  ← HTTP request to get the file /1stCo/index.html from the server
                                  ← An extra newline (enter key) is required!
HTTP/1.1 200 OK
Date: Tue, 29 Jan 2013 15:40:38 GMT
Server: Apache
X-Powered-By: PHP/5.3.15
Content-Length: 4870
Connection: close
Content-Type: text/html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
	<title>First Company - Semper Primo</title>
	<link rel="stylesheet" media="screen" type="text/css" href="style.css" />
    <meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
  </head>

  <body>
	<div id="background">
    <div id="header">
      <p id="logo">FIRST <span class="white">COMPANY</span></p>
	  <p id="slogan">SHIPMATES</p>
	</div>
	<div id="page">  <!-- wraps and defines overall page width, centers it -->
	  <div id="content"> <!-- content area of page -->
          
<!-- code below will make a new grey box for content, cut and paste as needed -->		  
		<div class="box-top"></div>
<div class="box">
		    <div class="box-padding">
              <h1>.:  Welcome to First</h1>
			  <div class="image-text-right"><!-- image floated left with imagefloat class, text will align to right -->
			    <img src="images/steamboatwillie.jpg" width="45%" height="45%" class="imagefloat" alt="" />
				<p><font color="white">...brought to you by the Class of '12 '13 '14 '15.</font></p>
				<br>
				<p>Of all the companies you could have been in, you wound up in <font color="#104E8B"><b>FIRST</b></font>.  Fate brought you here into the first of thirty in the Brigade of Midshipmen.  That's something special, isn't it?<br>
				<br>Now the ball's in your court.  What are you going to do with this opportunity?</p>
				<p>	
				</p>
...
I cut out most of the response to save you the pain of looking at it all. If you really want to see it, check out the full transcript. The response from the server also follows the HTTP protocol, and we can make some sense of it. "HTTP/1.1 200 OK" means the server was able to respond successfully to the request. "Content-Type: text/html" is especially important: with the Content-Type line, the server is telling the browser what kind of file it is serving up. In this case the server is telling the browser that what follows is a plain text file following the html format. This provides an excellent segue to ...

HTML (Hyper-Text Markup Language)

What the webserver sends the browser and what the browser shows us are usually very different things. Most webpages are plain text files in a language called HTML (Hyper-Text Markup Language). The browser doesn't show you the HTML it receives, rather the HTML instructs the browser as to what to put on the page. When the browser follows the HTML instructions and draws something pretty on the screen, we say that the browser is rendering the HTML. So in the example HTTP transaction from the previous section, what you were seeing from the server was the raw HTML, not the rendered page. To understand how websites work, and certainly to create your own, you need to know the basics of HTML.

First and foremost, HTML is just text. So you create HTML files with text editors like Notepad. Second, the structure of HTML is provided by tags. A tag is a name in angle brackets (< >). Most tags come in begin/end pairs, where the end pair just has a / before the name, e.g. <foo> ... </foo>. So, for instance, to format like "I said hello out there!", you'd have in your HTML file:

I said <b>hello</b> out there!
Some tags are structural — for instance every HTML file is wrapped up in <html> ... </html> tags — while others (like <b> ... </b> are pure formatting). Next lesson you'll learn to create webpages in HTML, but for now, let's take a look at the basic structure of a page:
HTML CodeAs Rendered in the Browser
<html>
  <head>
  </head>
  <body>
    <h1>A Simple Web Page</h1>
    <p>
      This page has <b>two</b> paragraphs.
      The first has an image
      <img src="SleepyFace.JPG"> and
      <a href="http://www.usna.edu">a link</a>.      
    </p>
    <p>
      The second has
      <span style="color: #ff0000">different colors</span>,
      which is cool.  It also has some funky characters:
      &#0931; &#8680; &#9650; 
    </p>
  </body>
</html>
A Simple Web Page

This page has two paragraphs. The first has an image and a link.

The second has different colors, which is cool. It also has some funky characters: Σ ⇨ ▲

Obviously there's a lot to talk about here. We needn't cover it all, since next lesson will. A few quick points:

  1. Every HTML file has the format:
    <html>
      <head>
            ← stuff goes here
      </head>
      <body>
            ← stuff goes here
      </body>
    </html>
    ... meaning that every HTML file has a head and a body (hence the tatoo). The body is what actually gets printed on the page. The head is used for other purposes, which we'll discuss later.
  2. A paragraph consists of anything inside <p> ... </p> tags. Line breaks and blank lines in the HTML source code are irrelevent: if you want paragraphs in the rendered output, you need <p> ... </p> tags! Otherwise, text just stays on a single line, automatically wrapping to the next line according to the width of the browser window.
  3. Colors in HTML are defined by RGB triples, which describe the amount of red, green and blue in a color as two-hex character value (one byte for each color). Thus, the color #ff0000 has maximum 'r' intensity, and minimum 'g' and 'b' intensities. In other words, it's red! As you see, there's no escaping hex!
  4. Speaking of escaping ... what if you wanted to put a < character in your HTML code? You'd have trouble because < has a special meaning in HTML: it starts a tag. The ASCII value of < is 60, and you can specify a character by ASCII value like this: &#60; is <. So ASCII's not going away either!
In fact, there's a HUGE set of characters that browsers understand — a superset of ASCII called Unicode. You enter unicode the same way as ASCII, the numbers just get bigger. Here's a nice reference.

HTTP client-server interactions revisited

Consider the HTML file http://rona.cs.usna.edu/~si110/lec/l10/ex2.html shown below:
HTML Code: ex2.htmlAs Rendered in the Browser
<html>
  <head>
  </head>
  <body>
    
    <h1>A Simple Webpage With a Few Links</h1>

    <p>
      First we have a cat:
      <img src="SleepyFace.JPG">
    </p>
    
    <p>
      Then a comic:
      <img src="http://www.foxtrot.com/comics/2011-10-02-5a620ce6.gif">
    </p>
    
    <p>
      Then a link:  
      The above cartoon comes from the
      <a href="http://www.foxtrot.com/2011/10/10022011/">FoxTrot Website</a>
    </p>

  </body>
</html>
We're going to take a look at what happens "under the hood" from the time you enter the URL in your browser's URL bar until you actually see the page rendered. (The FoxTrot cartoon is worth a close look.)
  1. You enter http://rona.cs.usna.edu/~si110/lec/l10/ex2.html into the URL bar and press Enter.
  2. The browser sends rona.cs.usna.edu a GET request for the file /~si110/lec/l10/ex2.html
  3. The server finds /~si110/lec/l10/ex2.html on its harddrive and sends it back to the browser.
  4. The browser receives ex2.html and looks through it, noticing that images SleeyFace.JPG and http://www.foxtrot.com/comics/2011-10-02-5a620ce6.gif will be needed in order to render the page.
  5. the browser issues a GET request to rona.cs.usna.edu for /~si110/lec/l10/SleeyFace.JPG, and a GET request to www.foxtrot.com for /comics/2011-10-02-5a620ce6.gif. These will actually go out more or less simultaneously.
  6. rona.cs.usna.edu receives the request for /~si110/lec/l10/SleeyFace.JPG, finds that file on its harddrive and sends it back to the browser.
  7. www.foxtrot.com receives the request for /comics/2011-10-02-5a620ce6.gif, finds that file on its harddrive and sends it back to the browser.
  8. eventually, the browser receives both image files, and it now has all the data it needs to render the page on the screen ... so it does.
Notice that there's another URL in the document, from the line:
<a href="http://www.foxtrot.com/2011/10/10022011/">FoxTrot Website</a>
This does not result in any further HTTP traffic, i.e. in any further GET's, because no information about that file is required to render the page ex2.html. Of course, if the user clicks on that link, the browser will then issue a GET request for it.

Browsers often allow you to listen in on the HTTP traffic that goes on under the hood. In Chrome, if you open up the Developer Tools (wrench button / Tools / Developer Tools) and click on the Network tab, you can see all the GET's that Chrome sends when it renders a page. Try opening it up, and entering a common URL like http://www.amazon.com. It's astounding how many GET's are required to render a page like that!

Logs: the paper trail ... electron trail?

Finally, it is good to be mindful of the fact that you do leave footprints when you navigate around the web. Recall the simple transaction steps for a URL like http://intranet.cs.usna.edu/~si110/index.html:
  1. The browser contacts the server intranet.cs.usna.edu and asks it to get the file ~si110/index.html.
  2. The server retrieves the file ~si110/index.html and sends it (serves it) to the browser.
  3. The browser receives the file from the server and renders it on screen in your browser window.
In step one, the server makes a record in a file called its access log that you requested that file and that it served it up to you. In class, we checked out the access logs and found where it was recorded that we'd visited that page. In step three, the browser receives and renders the file, but wait ... there's more. The browser records that it visited the site (that's where your browser's history comes from), and it keeps a copy of the page in what's called cache. That way, if you turn around and ask for the same page again, it can just display the copy it's already cached, and avoid the delay and effort of fetching it from the server again. The server logs, browser history and browser cache are all traces you leave behind as you navigate the web. Think about that!

What is the "World Wide Web"?

The World Wide Web is the vast global collection of of webservers and webclients (aka browsers) communicating over the Internet using the HTTP protocol (or HTTPS).


xkcd.com/1144/