HTTP Protocol

HTTP and HTTPS

Lets step back a bit and discuss the bigger picture of how the client and the server communicate. You may have touched this subject in a previous Networking class, and possibly in considerably more detail. We will focus on the basics. What we want to consider is that there is a back-and-forth conversation between the client and the server, and there are specifically agreed upon terms to this conversation.

What does an HTTP request look like?

User enters URL: http://courses.cs.usna.edu/

Browser sends request to courses.cs.usna.edu:

GET / HTTP/1.1
Host: courses.cs.usna.edu
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36
Accept: */*

Note: there is an empty line after the Accept: */* line which ends the HTTP header and the request.

If the server accepts the request, the server will send back a response:


HTTP/1.1 200 OK
Date: Fri, 30 Aug 2019 00:02:03 GMT
Server: Apache/2.4.29 (Ubuntu)
Vary: Accept-Encoding
Content-Length: 2967
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
      ...

It is important that you notice the extra newlines that were sent by both the client and the server, these are used to inform the receiver that the sender has completed transmitting header/metadata and will now send content.

Using netcat to test

You can test this by bringing up a terminal and running netcat. The line below shows how to start the netcat session by connecting to port 80, the default port for HTTP requests, after which you can add the HTTP request (as seen above) and hitting enter a few times to send the request.

nc courses.cs.usna.edu 80

Sometimes it is better to pipe the HTTP request to the netcat connection as below:

printf "\
GET / HTTP/1.1\r\n\
Host: courses.cs.usna.edu\r\n\
\r\n\
" | nc courses.cs.usna.edu 80

Using curl to test

It will actually be easier to test using the curl command, as it will add the appropriate headers for you (instead of typing them in manually like we did with nc). More importantly, curl highlights the back-and-forth communications between the client and the server as they negotiate the terms of the request.

curl -v courses.cs.usna.edu 2> courses.headers 1> courses.html

The odd redirection you are seeing above in the command line, allows us to send stdout (the html) to the file courses.html, and the stderr (the headers) to a separate file. curl specifically does this to ensure that you only work with the content if you were to perform regular redirection of the stdout. Additionally, it will help you with testing and help in keeping the content separate from the meta-data and headers. Below is an example of the headers, and as you can see the flow of information is annotated by the arrows > and <.


  * Rebuilt URL to: courses.cs.usna.edu/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.1.83.55...
* TCP_NODELAY set
* Connected to courses.cs.usna.edu (10.1.83.55) port 80 (#0)
> GET / HTTP/1.1
> Host: courses.cs.usna.edu
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Fri, 30 Aug 2019 00:02:03 GMT
< Server: Apache/2.4.29 (Ubuntu)
< Vary: Accept-Encoding
< Content-Length: 2967
< Content-Type: text/html; charset=UTF-8
< 
{ [1015 bytes data]
100  2967  100  2967    0     0  21345      0 --:--:-- --:--:-- --:--:-- 21345
* Connection #0 to host courses.cs.usna.edu left intact

We are using courses.cs.usna.edu as the example as the site is not encrypted by default. Many sites will attempt to quickly redirect the user to their encrypted site, and for the purposes of our discussion the insecure communication is interesting.

Variants of the HTTP request

Below are a few different variations on the request method. Test them and see what the differences are.

```
HEAD / HTTP/1.0
```

GET /cgi-bin/query.pl?str=dogs&lang=en HTTP/1.0

POST /cgi-bin/query.pl HTTP/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 16

str=dogs&lang=en

GET /img1.jpg HTTP/1.1
Host: www.host1.com

GET /img6.jpg HTTP/1.1
Host: www.host1.com
Connection: close

Response Codes

OK
Moved permanently
Bad request
Forbidden
Not found
Internal server error
Service unavailable

Seeing this in action

Bring up the Chrome developers console (ctrl-shift-j), click the network tab, and then reload courses.cs.usna.edu. You can now see all of the individual items that were loaded and you can see the contents of the HTTP requests and responses.

For another example, go to the course website and take a look at what was sent and received. You can click on each file and see the headers of both the requests and responses and the body of the responses received as well.

Web pages and Encryption

There are quite a few websites that need to be encrypted to protect the content (think banking, health, privacy, etc.) We need a method to encode/decode the data. HTTPS uses both asymmetric and symmetric encryption. This process runs in the following sequence:

Browser connects to SSL/TLS-Enabled server
Computers agree on an encryption method
Sever sends it's digital certificate (with it's public key)
Browser and server generate session key
Further communications are encrypted using the session key.

What to take away from this discussion

One of the most important things that you need to take away from this conversation is that HTTP is a stateless protocol. Each request is treated independently. Moreover, when you go to a website two things are going to happen assuming the request was valid:

The server is going to provide metadata about the connection status, information about itself, and whatever information it wants the client to know about, such as cookies.
The server will then send the content from whatever source provided it. This source can be a static web page or some type of dynamic content generated by a script or program running on the server. Those scripts/programs just need to provide HTML.

Practice Problems

How does the HTTP request look like for

http://midn.cs.usna.edu/~mXXXXXX/IT350.html

Using both netcat and curl, download the headers and content for your most recently submitted lab.

How do the HTTP request and response look like?

	 
	 http://csfaculty.academy.usna.edu/~adina/it350/demo/welcome.php?username=ac

welcome.php:

	
<?php
$username = $_GET["username"];

setcookie("username",$username);

echo '<!DOCYPE html>
<html lang=”en”><head>
<meta charset = “utf-8”>
<title>Test</title></head>
<body>';
echo "<h1>Welcome $username</h1>";
echo '</body></html>';
?>