/sy110/Cyber Battlefield/The Web & HTML

This lesson marks a change. In previous lessons, we have talked about a single computer. Now we turn our attention to the World Wide Web, which is a system comprised of many computers.

Introduction

There are many different web-browsers. In this course you'll need to have Mozilla Firefox and Google Chrome at a minimum. Other browsers include Microsoft Internet Explorer, Apple Safari, the Opera browser, and browser variants that are optimized for phones and tablets.
                   

Browsers and URLs

Public web pages for navy ships

Many ships in the U.S. Navy have their own public web sites. The URL is http://www.shipname.navy.mil. For example, the USS NIMITZ's website is at http://www.nimitz.navy.mil. When a client requests the web page from one of the URLs above, that request is directed to the web server that hosts (serves) the web site. For ships and submarines, hosting a web site is not practical because of limited availability, bandwidth considerations, risk of detection, and the increased vulnerability (penetration point into) to the ship's internal network. Instead, shore-based commands provide and manage the web server that hosts ship and submarine web sites, much like how you will have a web site on rona, but you do not have to worry about the operation of rona (just your content).

As far as who's in charge of a ship's web site — i.e. who is responsible for the images, HTML files, etc. that comprise the site — depending on the command, any Officer on board could be placed in charge of the ship's web page. So you may end up being responsible for a site like this.

Two Protocols: File and HTTP

HTML (Hyper-Text Markup Language)

What the web server sends the browser and what the browser shows us are usually very different things. Most web pages are plain text files in a language called HTML (Hyper-Text Markup Language). The browser doesn't show you the HTML it receives, rather the browser interprets the HTML code and displays the web page (your web browser knows how to interpret HTML and JavaScript). When the browser follows the HTML instructions and draws something pretty on the screen, we say that the browser is rendering the HTML. So in the example HTTP transaction from the previous section, what you were seeing from the server was the raw HTML, not the rendered page. To understand how websites work, and certainly to create your own, you need to know the basics of HTML.

Client-Server Interaction

HTTP Client-Server Communications

Consider the HTML file
http://rona.academy.usna.edu/~sy110/lec/wwwIntro/ex2.html
shown below:

HTML Code: ex2.htmlAs Rendered in the Browser
<HTML>
  <HEAD>
  </HEAD>
  <BODY>
    
    <H1>A Simple Web Page With a Few Links</H1>

    <P>
      First we have a cat:
      <IMG src="SleepyFace.JPG">
    </P>
    
    <P>
      Then a comic:<BR>
      <IMG src="http://www.foxtrot.com/wp-content/uploads/2014/07/ft111002noncompliant.png">
    </P>
    
    <P>
      Then a link:  
      The above cartoon comes from the
      <A href="http://www.foxtrot.com/2011/10/02/non-compliant/">FoxTrot Website</A>
    </P>

  </BODY>
</HTML>

Modern day browsers have lots of features that you are already familiar with. Historically, the input control where the user entered in the URL for the web site they wanted to visit was called the address bar, because it only served the purpose of entering and displaying the current URL (current address). Many modern day browsers also provide web search features if the user input is not a URL; hence, some people are now using the term omni bar to signify that the address bar now has other features associated with it.
Think about how programmers had to add more input handling code to support the added features. More features, means greater system complexity.
We're going to take a look at what happens "under the hood" from the time you enter the URL in your browser's address bar until you actually see the page rendered. (The FoxTrot cartoon is worth a close look.)
  1. You enter http://rona.academy.usna.edu/~sy110/lec/wwwIntro/ex2.html into the URL bar and press Enter.
  2. The browser sends rona.academy.usna.edu a GET request for the file /~sy110/lec/wwwIntro/ex2.html
  3. The server finds /~sy110/lec/wwwIntro/ex2.html on its hard drive and sends it back to the browser.
  4. The browser receives ex2.html and looks through it, noticing that images SleeyFace.JPG and http://www.foxtrot.com/wp-content/uploads/2014/07/ft111002noncompliant.png will be needed in order to render the page.
  5. the browser issues a GET request to rona.academy.usna.edu for /~sy110/lec/wwwIntro/SleeyFace.JPG, and a GET request to www.foxtrot.com for /wp-content/uploads/2014/07/ft111002noncompliant.png. These will actually go out more or less simultaneously.
  6. rona.academy.usna.edu receives the request for/~sy110/lec/wwwIntro/SleeyFace.JPG, finds that file on its hard drive and sends it back to the browser.
  7. www.foxtrot.com receives the request for/wp-content/uploads/2014/07/ft111002noncompliant.png, finds that file on it's hard drive and sends it back to the browser.
  8. eventually, the browser receives both image files, and it now has all the data it needs to render the page on the screen ... so it does.
Notice that there's another URL in the document, from the line:
<A href="http://www.foxtrot.com/2011/10/02/non-compliant/">FoxTrot Website</A>
This does not result in any further HTTP traffic, i.e. in any further GET requests, because no information about that file is required to render the page ex2.html. Of course, if the user clicks on that link, the browser will then issue a GET request for it. In short, embedded images will usually result in a GET request to the server on which the image is located, but links in a web page do not result in GET requests to the linked URLs.

Browsers often allow you to view the HTTP traffic that goes on under the hood. In Chrome, if you open up the Developer Tools (hotdog button → Tools → Developer Tools) and click on the Network tab, you can see all the GET requests that Chrome sends when it renders a page. Try opening it up, and entering a common URL like http://www.amazon.com. It's astounding how many GET's are required to render a page like that!

Web Server Logs

It is good to be mindful of the fact that you do leave footprints when you navigate around the web. Recall the simple transaction steps for a URL like http://rona.academy.usna.edu/~sy110/index.html:
  1. The browser contacts the server rona.academy.usna.edu and asks it to get the file /~sy110/index.html.
  2. The server retrieves the file /~sy110/index.html and sends it (serves it) to the browser.
  3. The browser receives the file from the server and renders it on screen in your browser window.
There is a record of the transaction created on both the server and the client (browser). The server logs, browser history and browser cache are all traces you leave behind as you navigate the web. Think about that!