This lesson marks a change. In previous lessons, we have talked about a single computer. Now we turn our attention to the World Wide Web, which is a system comprised of many computers.
.navy.mil. For example, the USS BAINBRIDGE's website is at http://www.bainbridge.navy.mil, and the USS NIMITZ's website is at http://www.nimitz.navy.mil. When a client requests the web page from one of the URLs above, that request is directed to the web server that hosts (serves) the website. For ships and submarines, hosting a website is not practical because of limited availability, bandwidth considerations, risk of detection, and the increased vulnerability (penetration point into) to the ship's or submarine's internal network. Instead, shore-based commands provide and manage the webserver that hosts ship and submarine websites, much like the Computer Science Department provides and manages the rona webserver that hosts your websites.
As far as who's in charge of a ship's website — i.e. who is responsible for the images, html files, etc. that comprise the site — depending on the command, any officer on board could be placed in charge of the ship's web page. So you may end up being responsible for a site like this.
A browser's primary control is the address bar. You enter a URL (Uniform Resource Locator) that describes to the browser where to find the item you want (roughly by specifying a web server and a file on that webserver), and the browser contacts the web server and requests the item which, hopefully, the server then sends back. A URL typically specifies three things:
http://www.usna.edu/Users/cs/wcbrown/index.html \__/ \__________/\__________________________/ | | | protocol | path on server's filesystem server
en.wikipedia.org. We'll talk a bit more about domain names in the networking section of the course. The path is a relative path from some point in the server's filesystem. The gotcha on the path is that it uses Unix path conventions, which means forward slashes (/) instead of back slashes (\), regardless of whether the server is a Windows server or a Unix server. Finally we get to the protocol. Most browsers support several protocols, including: http, https, file, mailto and ftp. Essentially, the world wide web consists of browsers and webservers communicating via the http (hyper-text transfer protocol) protocol. The https protocol is just a "secure" version of http — more on that later.
When you put a URL like
in your browser's address bar, it initiates the following
sequence of actions:
intranet.usna.eduand asks it to get the file
1stCo/index.htmland sends it (serves it) to the browser.
Browsers used to (meaning 'til 2010) have a status bar
at the bottom of the screen that gave you important
information about the status of the browser. That's gone on
all major browsers, but there's still a little popup for the
status in some circumstances, and it's important. Hover your
mouse over this link and look for
the popup window with the text
This status popup is telling you the address the browser will
go to if you click on this link. There's a little bit of a
misdirection trick that knowing about the status popup can help you
Don't click on the following link, but check out where the
browser will actually send you if you click on it:
If this kind of misdirection doesn't seem like a big deal, check out this Wired article Anonymous Tricks Bystanders Into Attacking Justice Department. about a January 2012 use of exactly this technique.
fileprotocol. Note that this is not the web! It's not client-server and it doesn't use http/https. Suppose you were user m169999 and you had a file on your Desktop called
vacation.jpg. Putting the following URL in the browser's address bar would result in the browser showing you that image:
file:///C:/Users/m169999/Desktop/vacation.jpgNote that the "server" portion of the URL has collapsed to nothing, which is why there are three /'s in a row, indicating that we're accessing the file on our local machine. Using ctrl+o, you can browse the filesystem to open a file, which may be more convenient than entering a URL. The
fileprotocol is really useful when building websites, since you can get a quick look at a page even before you put it on a webserver.
GET". The key point is YOUR browser makes a request to a remote server ON YOUR BEHALF ... usually to have a given file sent to it. You can, in fact, send requests to a webserver on your own, i.e. without going through a browser. As with so many things, however, be careful what you ask for! We'll use a tool called netcat (
nc) which allows you to send network requests at a low level. Let's compare what you see when you browse to http://intranet.usna.edu/1stCo/index.html with what the browser sees and goes through to bring you that pretty page. What's in red is what we type, what's in green is what the server sends back.
$ nc intranet.usna.edu 80 ← have netcat connect to the webserver intranet.usna.edu GET /1stCo/index.html HTTP/1.0 ← HTTP request to get the file /1stCo/index.html from the server ← An extra newline (enter key) is required! HTTP/1.1 200 OK Date: Tue, 29 Jan 2013 15:40:38 GMT Server: Apache X-Powered-By: PHP/5.3.15 Content-Length: 4870 Connection: close Content-Type: text/html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>First Company - Semper Primo</title> <link rel="stylesheet" media="screen" type="text/css" href="style.css" /> <meta http-equiv="Content-type" content="text/html;charset=UTF-8" /> </head> <body> <div id="background"> <div id="header"> <p id="logo">FIRST <span class="white">COMPANY</span></p> <p id="slogan">SHIPMATES</p> </div> <div id="page"> <!-- wraps and defines overall page width, centers it --> <div id="content"> <!-- content area of page --> <!-- code below will make a new grey box for content, cut and paste as needed --> <div class="box-top"></div> <div class="box"> <div class="box-padding"> <h1>.: Welcome to First</h1> <div class="image-text-right"><!-- image floated left with imagefloat class, text will align to right --> <img src="images/steamboatwillie.jpg" width="45%" height="45%" class="imagefloat" alt="" /> <p><font color="white">...brought to you by the Class of '12 '13 '14 '15.</font></p> <br> <p>Of all the companies you could have been in, you wound up in <font color="#104E8B"><b>FIRST</b></font>. Fate brought you here into the first of thirty in the Brigade of Midshipmen. That's something special, isn't it?<br> <br>Now the ball's in your court. What are you going to do with this opportunity?</p> <p> </p> ...I cut out most of the response to save you the pain of looking at it all. If you really want to see it, check out the full transcript. The response from the server also follows the HTTP protocol, and we can make some sense of it. "
HTTP/1.1 200 OK" means the server was able to respond successfully to the request. "
Content-Type: text/html" is especially important: with the Content-Type line, the server is telling the browser what kind of file it is serving up. In this case the server is telling the browser that what follows is a plain text file following the html format. This provides an excellent segue to ...
First and foremost, HTML is just text. So you create HTML files with text editors like Notepad. Second, the structure of HTML is provided by tags. A tag is a name in angle brackets (< >). Most tags come in begin/end pairs, where the end pair just has a / before the name, e.g. <foo> ... </foo>. So, for instance, to format like "I said hello out there!", you'd have in your HTML file:
I said <b>hello</b> out there!Some tags are structural — for instance every HTML file is wrapped up in <html> ... </html> tags — while others (like <b> ... </b> are pure formatting). Next lesson you'll learn to create webpages in HTML, but for now, let's take a look at the basic structure of a page:
|HTML Code||As Rendered in the Browser|
<html> <head> </head> <body> <h1>A Simple Web Page</h1> <p> This page has <b>two</b> paragraphs. The first has an image <img src="SleepyFace.JPG"> and <a href="http://www.usna.edu">a link</a>. </p> <p> The second has <span style="color: #ff0000">different colors</span>, which is cool. It also has some funky characters: Σ ⇨ ▲ </p> </body> </html>
A Simple Web Page
This page has two paragraphs. The first has an image and a link.
The second has different colors, which is cool. It also has some funky characters: Σ ⇨ ▲
Obviously there's a lot to talk about here. We needn't cover it all, since next lesson will. A few quick points:
<html> <head> ← stuff goes here </head> <body> ← stuff goes here </body> </html>... meaning that every HTML file has a head and a body (hence the tatoo). The body is what actually gets printed on the page. The head is used for other purposes, which we'll discuss later.
<p> ... </p>tags. Line breaks and blank lines in the HTML source code are irrelevent: if you want paragraphs in the rendered output, you need
<p> ... </p>tags! Otherwise, text just stays on a single line, automatically wrapping to the next line according to the width of the browser window.
#ff0000has maximum 'r' intensity, and minimum 'g' and 'b' intensities. In other words, it's red! As you see, there's no escaping hex!
|HTML Code: ||As Rendered in the Browser|
<html> <head> </head> <body> <h1>A Simple Webpage With a Few Links</h1> <p> First we have a cat: <img src="SleepyFace.JPG"> </p> <p> Then a comic: <img src="http://www.foxtrot.com/comics/2011-10-02-5a620ce6.gif"> </p> <p> Then a link: The above cartoon comes from the <a href="http://www.foxtrot.com/2011/10/10022011/">FoxTrot Website</a> </p> </body> </html>
http://rona.cs.usna.edu/~si110/lec/l10/ex2.htmlinto the URL bar and press Enter.
rona.cs.usna.edua GET request for the file
/~si110/lec/l10/ex2.htmlon its harddrive and sends it back to the browser.
ex2.htmland looks through it, noticing that images
http://www.foxtrot.com/comics/2011-10-02-5a620ce6.gifwill be needed in order to render the page.
/~si110/lec/l10/SleeyFace.JPG, and a GET request to
/comics/2011-10-02-5a620ce6.gif. These will actually go out more or less simultaneously.
rona.cs.usna.edureceives the request for
/~si110/lec/l10/SleeyFace.JPG, finds that file on its harddrive and sends it back to the browser.
www.foxtrot.comreceives the request for
/comics/2011-10-02-5a620ce6.gif, finds that file on its harddrive and sends it back to the browser.
<a href="http://www.foxtrot.com/2011/10/10022011/">FoxTrot Website</a>This does not result in any further HTTP traffic, i.e. in any further GET's, because no information about that file is required to render the page ex2.html. Of course, if the user clicks on that link, the browser will then issue a GET request for it.
Browsers often allow you to listen in on the HTTP traffic that goes on under the hood. In Chrome, if you open up the Developer Tools (wrench button / Tools / Developer Tools) and click on the Network tab, you can see all the GET's that Chrome sends when it renders a page. Try opening it up, and entering a common URL like http://www.amazon.com. It's astounding how many GET's are required to render a page like that!
intranet.cs.usna.eduand asks it to get the file
~si110/index.htmland sends it (serves it) to the browser.