This lesson marks a change. In previous lessons, we have talked about a single computer. Now we turn our attention to the World Wide Web, which is a system comprised of many computers.
http://www.shipname.navy.mil.
For example, the USS BAINBRIDGE's website is at
http://www.bainbridge.navy.mil,
and the USS NIMITZ's website is at
http://www.nimitz.navy.mil.
When a client requests the web page from one of the URLs above, that
request is directed to the web server that hosts (serves) the website.
For ships and submarines, hosting a website is not practical because of
limited availability, bandwidth considerations, risk of detection, and
the increased vulnerability (penetration point into) to the
ship's or submarine's
internal network. Instead, shore-based commands provide and
manage the webserver that hosts ship and submarine websites,
much like the Computer Science Department provides and manages
the rona webserver that hosts your websites.
As far as who's in charge of a ship's website — i.e. who is responsible for the images, html files, etc. that comprise the site — depending on the command, any officer on board could be placed in charge of the ship's web page. So you may end up being responsible for a site like this.
A browser's primary control is the address bar. You enter a URL (Uniform Resource Locator) that describes to the browser where to find the item you want (roughly by specifying a web server and a file on that webserver), and the browser contacts the web server and requests the item which, hopefully, the server then sends back. A URL typically specifies three things:
http://www.usna.edu/Users/cs/wcbrown/index.html
\__/ \__________/\__________________________/
| | |
protocol | path on server's filesystem
server
www.cnn.com or en.wikipedia.org.
We'll talk a bit more about domain names in the networking
section of the course. The path is a relative path from some
point in the server's filesystem. The gotcha on the path is
that it uses Unix path conventions, which means forward
slashes (/) instead of back slashes (\), regardless of whether
the server is a Windows server or a Unix server.
Finally we get to the protocol. Most browsers support several
protocols, including: http, https, file, mailto and ftp.
Essentially, the world wide web consists of browsers and
webservers communicating via the http (hyper-text transfer
protocol) protocol. The https protocol is just a "secure"
version of http — more on that later.
When you put a URL like http://intranet.usna.edu/1stCo/index.html
in your browser's address bar, it initiates the following
sequence of actions:
intranet.usna.edu
and asks it to get the file 1stCo/index.html.
1stCo/index.html
and sends it (serves it) to the browser.
Browsers used to (meaning 'til 2010) have a status bar
at the bottom of the screen that gave you important
information about the status of the browser. That's gone on
all major browsers, but there's still a little popup for the
status in some circumstances, and it's important. Hover your
mouse over this link and look for
the popup window with the text http://www.usma.edu.
This status popup is telling you the address the browser will
go to if you click on this link. There's a little bit of a
misdirection trick that knowing about the status popup can help you
avoid.
Don't click on the following link, but check out where the
browser will actually send you if you click on it:
If this kind of misdirection doesn't seem like a big deal, check out this Wired article Anonymous Tricks Bystanders Into Attacking Justice Department. about a January 2012 use of exactly this technique.
is displayed (by most browsers) when your connection to a
server is using the https protocol, which is the secure
version of http.
file protocolfile protocol. Note that this is not
the web! It's not client-server and it doesn't use http/https.
Suppose you were user m169999 and you had a file on your Desktop
called vacation.jpg. Putting the following URL in
the browser's address bar would result in the browser showing
you that image:
file:///C:/Users/m169999/Desktop/vacation.jpgNote that the "server" portion of the URL has collapsed to nothing, which is why there are three /'s in a row, indicating that we're accessing the file on our local machine. Using ctrl+o, you can browse the filesystem to open a file, which may be more convenient than entering a URL. The
file protocol is really useful when building
websites, since you can get a quick look at a page even before
you put it on a webserver.
GET". The key point is YOUR browser makes a request
to a remote server ON YOUR BEHALF ... usually to have a
given file sent to it. You can, in fact, send requests to a
webserver on your own, i.e. without going through a browser. As
with so many things, however, be careful what you ask for!
We'll use a tool called netcat (nc) which allows
you to send network requests at a low level. Let's compare what
you see when you browse to
http://intranet.usna.edu/1stCo/index.html
with what the browser sees and goes through to bring you that
pretty page. What's in red is what we type, what's in green is
what the server sends back.
$ nc intranet.usna.edu 80 ← have netcat connect to the webserver intranet.usna.edu GET /1stCo/index.html HTTP/1.0 ← HTTP request to get the file /1stCo/index.html from the server ← An extra newline (enter key) is required! HTTP/1.1 200 OK Date: Tue, 29 Jan 2013 15:40:38 GMT Server: Apache X-Powered-By: PHP/5.3.15 Content-Length: 4870 Connection: close Content-Type: text/html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>First Company - Semper Primo</title> <link rel="stylesheet" media="screen" type="text/css" href="style.css" /> <meta http-equiv="Content-type" content="text/html;charset=UTF-8" /> </head> <body> <div id="background"> <div id="header"> <p id="logo">FIRST <span class="white">COMPANY</span></p> <p id="slogan">SHIPMATES</p> </div> <div id="page"> <!-- wraps and defines overall page width, centers it --> <div id="content"> <!-- content area of page --> <!-- code below will make a new grey box for content, cut and paste as needed --> <div class="box-top"></div> <div class="box"> <div class="box-padding"> <h1>.: Welcome to First</h1> <div class="image-text-right"><!-- image floated left with imagefloat class, text will align to right --> <img src="images/steamboatwillie.jpg" width="45%" height="45%" class="imagefloat" alt="" /> <p><font color="white">...brought to you by the Class of '12 '13 '14 '15.</font></p> <br> <p>Of all the companies you could have been in, you wound up in <font color="#104E8B"><b>FIRST</b></font>. Fate brought you here into the first of thirty in the Brigade of Midshipmen. That's something special, isn't it?<br> <br>Now the ball's in your court. What are you going to do with this opportunity?</p> <p> </p> ...I cut out most of the response to save you the pain of looking at it all. If you really want to see it, check out the full transcript. The response from the server also follows the HTTP protocol, and we can make some sense of it. "
HTTP/1.1 200 OK" means the
server was able to respond successfully to the request.
"Content-Type: text/html" is especially important:
with the Content-Type line, the server is telling the browser
what kind of file it is serving up. In this case the server is
telling the browser that what follows is a plain text file
following the html format. This provides an excellent
segue to ...
First and foremost, HTML is just text. So you create HTML files with text editors like Notepad. Second, the structure of HTML is provided by tags. A tag is a name in angle brackets (< >). Most tags come in begin/end pairs, where the end pair just has a / before the name, e.g. <foo> ... </foo>. So, for instance, to format like "I said hello out there!", you'd have in your HTML file:
I said <b>hello</b> out there!Some tags are structural — for instance every HTML file is wrapped up in <html> ... </html> tags — while others (like <b> ... </b> are pure formatting). Next lesson you'll learn to create webpages in HTML, but for now, let's take a look at the basic structure of a page:
| HTML Code | As Rendered in the Browser |
<html>
<head>
</head>
<body>
<h1>A Simple Web Page</h1>
<p>
This page has <b>two</b> paragraphs.
The first has an image
<img src="SleepyFace.JPG"> and
<a href="http://www.usna.edu">a link</a>.
</p>
<p>
The second has
<span style="color: #ff0000">different colors</span>,
which is cool. It also has some funky characters:
Σ ⇨ ▲
</p>
</body>
</html> |
A Simple Web Page
This page has two paragraphs.
The first has an image
The second has different colors, which is cool. It also has some funky characters: Σ ⇨ ▲ |
Obviously there's a lot to talk about here. We needn't cover it all, since next lesson will. A few quick points:
<html>
<head>
← stuff goes here
</head>
<body>
← stuff goes here
</body>
</html>
... meaning that every HTML file has a head and a body (hence the
tatoo). The body is what actually gets printed on the page.
The head is used for other purposes, which we'll discuss
later.
<p> ... </p> tags. Line breaks and
blank lines in the HTML source code are irrelevent: if you
want paragraphs in the rendered output, you need
<p> ... </p> tags!
Otherwise, text just stays on a single line, automatically
wrapping to the next line according to the width of the
browser window.
#ff0000 has maximum 'r' intensity,
and minimum 'g' and 'b' intensities. In other words, it's
red! As you see, there's no escaping hex!
http://rona.cs.usna.edu/~si110/lec/l10/ex2.html shown below:
HTML Code: ex2.html | As Rendered in the Browser |
<html>
<head>
</head>
<body>
<h1>A Simple Webpage With a Few Links</h1>
<p>
First we have a cat:
<img src="SleepyFace.JPG">
</p>
<p>
Then a comic:
<img src="http://www.foxtrot.com/comics/2011-10-02-5a620ce6.gif">
</p>
<p>
Then a link:
The above cartoon comes from the
<a href="http://www.foxtrot.com/2011/10/10022011/">FoxTrot Website</a>
</p>
</body>
</html>
|
|
http://rona.cs.usna.edu/~si110/lec/l10/ex2.html
into the URL bar and press Enter.rona.cs.usna.edu a GET
request for the file /~si110/lec/l10/ex2.html/~si110/lec/l10/ex2.html on
its harddrive and sends it back to the browser.ex2.html and looks
through it, noticing that images SleeyFace.JPG and
http://www.foxtrot.com/comics/2011-10-02-5a620ce6.gif
will be needed in order to render the page.
rona.cs.usna.edu
for /~si110/lec/l10/SleeyFace.JPG, and a GET request
to www.foxtrot.com for /comics/2011-10-02-5a620ce6.gif.
These will actually go out more or less simultaneously.
rona.cs.usna.edu receives the request for
/~si110/lec/l10/SleeyFace.JPG, finds that file on its
harddrive and sends it back to the browser.
www.foxtrot.com receives the request for
/comics/2011-10-02-5a620ce6.gif, finds that file on its
harddrive and sends it back to the browser.
<a href="http://www.foxtrot.com/2011/10/10022011/">FoxTrot Website</a>This does not result in any further HTTP traffic, i.e. in any further GET's, because no information about that file is required to render the page ex2.html. Of course, if the user clicks on that link, the browser will then issue a GET request for it.
Browsers often allow you to listen in on the HTTP traffic that goes on under the hood. In Chrome, if you open up the Developer Tools (wrench button / Tools / Developer Tools) and click on the Network tab, you can see all the GET's that Chrome sends when it renders a page. Try opening it up, and entering a common URL like http://www.amazon.com. It's astounding how many GET's are required to render a page like that!
http://intranet.cs.usna.edu/~si110/index.html:
intranet.cs.usna.edu
and asks it to get the file ~si110/index.html.
~si110/index.html
and sends it (serves it) to the browser.