This lesson looks at a core concept in the design of networks in general, and the internet inparticular: protocols. The way the internet works is often discussed in terms of "the stack", which organizes the basic protocols of the internet into layers. Additionally, there are some more standard tools (similar to nslookup and traceroute) that we'll learn about.

Protcols, Services & Utilities

There is a protocol that constrains radio communication between aircraft and control towers. Look at the following transcripts and see if you can identify the rules governing things like identifying yourself and the entity you want to communicate with, ackowledging receipt of message, and turn-taking.

 BR51: "Tower, BR51 request takeoff VFR to the west."
Tower: "BR51, Tower, you are cleared for takeoff to the west."
 BR51: "Tower, BR51, copy, cleared for takeoff."

This second transcript shows a different pattern in terms of turn-taking.

BR51: "Dulles Tower, this is Bay Raider 51, 35 miles to the west,
        final stop Dulles."
Tower: "standby"
Tower: "BR51, this is Dulles, cleared to Dulles"
 BR51: "Tower, BR51, copy, cleared to Dulles.

Tactical Voice Communications

U.S. Navy photo by MC2 Justan Williams/Released

The document ACP 125 gives a protocol for communications between Allied Forces on tactical voice nets, to "provide a standardized way of passing speech and data traffic." The protocol specifies such things as a phonetic alphabet ("ALFA, BRAVO, ..., ZULU"), prowords (e.g., "say again", "roger" ), U.S. Navy photo by MCSN Danian Douglas/Released

U.S. Navy photo by MCSN Danian Douglas/Released

how to unambiguously record a message (e.g., zero written as: Ø, letter Z written as: Ƶ), and brevity codes (e.g., the brief phrase "Birds away" means "Friendly surface-to-air missiles have been fired at the designated target").

When you pick up that VHF bridge-to-bridge radio on one of the YP's, the ACP 125 protocol tells you how you should talk!

Here's an example dialog between callsigns S7 and CC:

CC: "Sierra Seven this is Charlie Charlie,
     radio check, over."
S7: "This is Sierra Seven, roger, over."
CC: "Sierra Seven, Charlie Charlie, immediate execute,
     turn starboard niner, I say again, turn starboard
     niner, standby ... execute, over."

Computers and networked computer systems are hugely complex. Only by employing very careful, highly structured design can we humans deal with that complexity. The idea of protocol and protocol stack is at the heart of that design. A protocol is just an agreement about communication — a complete specification of what things can be said, what responses can and must be made, and what these thing mean. Protocols abound outside of the world of digital communication, but in this world protocols are extremely important.

Protocols usually revolve around providing a service. They govern the back-and-forth communication between the entity providing the service and the entity using the service. That boils down to specifying what messages can be sent and what those messages mean, e.g. what action results from sending that message. The protcol behind the web, HTTP, governs the interaction between web servers and web clients (browsers). We've seen a bit of this: Browsers can send messages like

GET /prices.html HTTP/1.1

to a server (of course you need its IP address to send it this message!). The HTTP protocol specifies exactly what this can look like and what the responses the request should elicit from the server. For example, the server might send back the message

HTTP/1.1 404 Not Found

which indicates that it did not have a file prices.html available.

For many services/protocols, there are standard utilities that allow you to use that service/protocol. We saw this already with name resolution:

service: Name Resolution,  protocol: DNS,  tool: nslookup

Protocol Stacks — TCP/IP Stack

The different protocols that make up the internet are organized into what's viewed as stacked layers. Before we define this model, called the TCP/IP Stack, we will consider an anology. In simple correspondence by snail mail, say you and Grandma writing letters and cards, we can view the whole process as consisting of four layers:

-------------------------------------
You                          Grandma    ← Corresponder Layer
-------------------------------------
mail man                     mail man   ← Carrier Layer
-------------------------------------
post office               post office   ← Depot Layer
-------------------------------------
    |                          |        ← Shipping Layer
    `-- trucks,trains,planes --'
-------------------------------------

In this model, there are two important features: 1) Each layer has a concrete, well-defined role — the service it provides, and 2) each layer only needs to know how to interact with the layers directly above and directly below them as the letter travels down one side of the stack, over to the other side at the bottom layer, then up the stack on other side — these interactions are governed each by their own protocol. For example, to send a letter the Corresponders Layer you need to follow the addressing & stamping rules and you need to know that you leave the letter sticking out of your mail slot / box. What happens after that doesn't matter to you. To recieve, of course, you just have to know to check your mail slot / box. Once again, how the other layers operate is not something you need to know about. If you're operating at the carrier layer, you just need to know: where to pick up your bag of letters at the post office, how to read the addresses so you can deliver letters to mail slots / boxes, where at the post office to bring the outgoing letters you picked up from peoples' mail slots / boxes along your route. What's going on between the Corresponders, whether the letters will be sent via train or ship by the Shipping layer ... all this is irrelevent to you. And so it goes with the other layers as well. Any individual within this system can be identified as acting at one of these layers, whether it's Grandma, or George the Postman, or Jenny the Cargo Jet Pilot.

In the mail example, we started with a system we already knew all about and understood well and organized it into a layered protocol stack. Now, we'll start with the layered protocol stack organization for the internet, and we'll learn about and try to understand the internet based on it. The layers in the TCP/IP stack are:

-----------------
Application Layer
-----------------
Transport Layer
-----------------
Internet Layer
-----------------
Link Layer
-----------------
Physical Layer
-----------------

Let's dispense with the easiest of these to understand: The Physical layer is wires and radio waves. Relatively little of what goes on in this course deals with this layer directly, so we won't say too much about it. In Cyber II, you'll learn a lot more about it. The application layer is about programs running on different hosts that want to communicate — like you and Grandma in our snail mail analogy. Our most familiar example of an application layer protocol is HTTP, the protocol that governs communication between web servers and browsers. Ideally, web servers and browsers need to know two things: the language they speak to one another (the HTTP protocol) and the protocol that governs interactions with the next layer down: the transport layer. So what is the transport layer all about?

To understand the transport layer, we should really understand what kind of service application-layer programs.need. Let's focus on the application-layer programs we're most familiar with: web-servers and web-clients. Basically, a client needs to send a bunch of bytes to the server (its request), and then wait and around and ultimately receive a bunch of bytes from the server (the server's response). The server needs the reverse. This is the service browsers and webservers need the transport layer to provide. In the case of HTTP, the browser specifies the host it wants to connect to, the bytes it wants sent and waits around for an answer. In order to make a connection, send all those bytes, and receive the response bytes back, the transport layer has to on one side break the request/response message up into small pieces and wrap each piece up with an addressed to form a packet, and on the other side reconstitute the received packets into a full message. Getting each individual packet from one host to the other is not the busnines of the transport layer. That's done by the next layer down: the Internet layer.

Proposals have been submitted for IP over Avian Carriers (i.e. sending internet traffic with carrier pigeons). See this Wikipedia entry. It's a joke, of course, but when you have a whole lot of data to send, a small enough distance to travel, and a good pigeon (or African Swallow), it's actually faster than lots of people's home internet connections!

The internet layer is responsible for routing packets across the internet from the source host to the destination host. The protocol that governs it, the Internet Protocol (IP), defines what constitutes a valid address and what format the bytes that make up a packet have to follow. What's interesting is that IP makes no guarauntees about delivering packets. Packets may get to their destination ... or not. A sequence of packets sent from one host to another may arrive in the proper order ... or not. The service is fundamentally unreliable. You're only guaraunteed best effort delivery. (Same with mids. We prof's are not guaraunteed that Midshipman X will learn the material, only that we'll get Midshipman X's best effort. We are gaurauneed of that, right? ... right?) If some kind of gaurantee of delivery and proper ordering is required, that gaurantee becomes an extra duty of the transport layer. The internet layer does routing, so it looks at the destination address on a packet and chooses what host to send the packet to next. Actually getting the packet from the current host to the next one is the job of the Link Layer. You'll learn more about the link layer when you build wired and wireless networks later on.

To summarize the upper layers, which are most important to us: Any programs that communicate over the internet to provide services to users (web, video-conferencing, file sharing, etc) are in the Application Layer. The Transport Layer is responsible for getting bytes from a process (executing instance of a program) on host A to process (executing instance of a program) on host B. The Internet Layer is responsible for getting packets from host A to host B.

Transport layer: TCP/UDP

There are two basic transport layer protocols: TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). TCP is like communicating via the phone: you establish a connection, which requires knowing the phone number (IP address), but after the connection is made, you communication through the connection and, as long as you don't hang up, you don't have to redial the number. Each word you utter (packet you send) makes it to the listener on the other end in the same order you said it. UDP is like communicating via the mail: there is no connection, so every time you send a letter you have to provide the address (IP Address). Letters get lost from time to time, and a letter you sent Monday may arrive before a letter you sent Tuesday. Where these analogies break down, however, is that UDP (very much unlike the mail compared to the phone) is very fast, much faster than TCP.

TCP - The Transmission Control Protocol is a connection oriented service that works in tandem with IP to provide services such as reliable transmission, error detection, flow control, and congestion control. To begin communication, a three-way handshake takes place to establish certain parameters. Every packet gets a sequence number, which helps allow for out-of-order arrival of packets. Think of TCP when you need perfect communication or data transfer.
UDP - The User Datagram Protocol is a connectionless protocol, so there is no three-way handshake. UDP is not a smart protocol, so it provides far fewer features. However, this protocol is much more lightweight than TCP because it has to check far fewer details with each packet. Think of this protocol for applications that are temporal in nature, like VOIP. Would you notice if a few packets were dropped in the middle of a UDP phone conversation? What would happen to the conversation if it was implemented using TCP and you had to wait for packets to arrive out of order?

Ports, Sockets

Ultimately, the Application Layer is about two processes (executing instances of programs) on different hosts communicating. The IP address identifies what host we want our data to go to, but not which of the many processes executing on the host we want that data to go to. If we were to make an analogy to telephone traffic, it's like calling a big office: the phone number gets you to the office, but you need something extra — an extension — to identify which specific phone in the office you're trying to place a call to. What takes that role in internet communication is the port number. The port number is a 16-bit non-negative number, which means it's in the range 0—2¹⁶-1 = 65535. The communication endpoint a given process (executing instances of a program) has to communicate and write to is called a socket, and each socket is uniquely identified by an IP Address + port number + transport-layer-protocol. The protocols we know are: IPv4 TCP, IPv4 UDP, IPv6 TCP, IPv6 UDP. So if a process on host A wants to send data to a specific process on host B, it needs to know B's IP address and the port number and protocol for the socket the given process on B is using. To continue with the phone-system-at-a-business analogy: a socket is like a single phone, and just as you need a phone number + extension to get to a specific phone at a business, you need an IP Address + port number + transport-layer-protocol to get to a specific socket on the internet.

It's worth noting that you can specify what port you want your browser to use to contact a webserver by putting a : then the number afer the domain name (or IP address). For example

http://www.usna.edu:80

tells the browser to use port 80 (which it would've done anyway), while

http://www.usna.edu:53

tells the browser to use port 53. This request will fail, of course, because USNA's webserver isn't listening on port 53, it's listening on 80. DNS nameservers listen on port 53.

When one process is acting as a server — e.g. a web server or a DNS server — client processes on other hosts need to know where to find the server process. As we now know, an IP address is not enough to identify that server process. We need a port number and a protocol (TCP or UDP) as well. Where does this information come from? The answer is that common application layer services, like webservers, by convention use a particular port number and protocol. For example, web servers usually "listen on" port 80, and HTTP traffic uses TCP. So, when you give the URL http://www.usna.edu, the browser asks for the name to be resolved by DNS and discovers it's 192.190.229.27, and then sends its GET request to:

IP address: 192.190.229.27 ,  port: 80 ,  protocol: IPv4-TCP

because "everyone knows" web servers use TCP on port 80. What port number the browser uses is irrelevent and is basically randomly assigned. But it's crucial that the web server listens for requests on port 80. In general, it's absolutely crucial that a server providing a service uses a well-known port number so that client process on other machines know where to send their requests.

This mapping between protocols/services and ports is very important for cyber security. You will start to develop your own list of services-to-ports you remember as the semester progresses. Check out this list of service-to-ports mappings. You may note things like, for example, that World of Warcraft uses port 3724. In fact, there is an organization called IANA (Internet Assigned Numbers Authority) that controls the allocation of port numbers to services (as well as controlling the top-level domain names).

netcat (`nc`)

Netcat (shell command is nc) is a very flexible and powerful tool. At its simplest, you give it an IP address (or symbolic name that it will resolve for you via DNS) and a port number, and whatever you type into after pressing enter gets sent to the given address and port using IPv4 TCP. In this mode it acts as a TCP client. The next simplest mode you give the -l switch (listen) and nc then acts like a server, listening for a connection request, accepting the first one it receives, then echoing whatever gets sent to it to the screen, and taking whatever gets typed on the screen and sending it to the client whose connection request it accepted. This requires a demo to really get, which you had in class. The diagram below illustrates what commands users at two hosts would have to give to make a tcp connection with netcat.

Host 10.53.88.12
(server: listens for connections on port 23456)

$ nc -l 23456

<------>

Host 10.53.12.94
(client: connects to 10.53.88.12 on port 23456)

$ nc 10.53.88.12 -p 23456

Note: the server command on Host 10.53.88.12 must be given first! Otherwise, when the client calls there's nobody home!

Netcat with the -u switch (UDP) uses UDP instead of TCP. Since this is connectionless, the server version accepts datagrams from any and all who send them, rather than making a connection with one client. As another consequence, the UPD server doesn't exit just because one of the clients exits.

With netcat we as users can act like application layer programs. Like most application layer programs, we require that the transport layer provide us with services, which means at a bare minimum that we decide whether we want UDP or TCP. There's lots of interesting demos and activities we can do that demonstrate how TCP and UDP work and how they differ from one another.

netstat

There's a utility called netstat that shows you what port/protocol combinations are currently in use on your machine. (It's a shell utility on both Windows and Unix.) On my machine right now, netstat lists a whole bunch of things including one interesting line given below.

Active Internet connections

  Proto Local Address           Foreign Address         State      
  ...
  TCP   10.53.33.254:60503      74.125.228.85:https   ESTABLISHED
  ...

I suspected that iad23s07-in-f21.1e100.net was one of Google's mail servers because I had my gmail account open at the time and I know it uses https because of the lock icon displayed in Chrome's location bar. I confirmed my suspicion by running nslookup mail.google.com, which resolved to the same address of 74.125.228.85.

What this tells us is that there is an established tcp connection between a socket on my machine (10.53.33.254) bound to port 59325 and a socket on a server at 74.125.228.85. (nslookup 74.125.228.85 tells me that's the address for iad23s07-in-f21.1e100.net, which I know to be a Google mail server because that's the only secure web page I had open at the time) bound to the port associated with the https protocol, which my list of port numbers tells me is port 443.

I can view all server processes running on my machine by name and process ID by running netstat.exe with the following options in an Administrator shell:

The following commands must be run in an Administrator shell

The equivalent Linux command with options is:

netstat -alnp

However, to really get all the process numbers and program names, you should run this as root, i.e. run it like this:

sudo netstat -alnp

netstat.exe -abno

Active Connections

  Proto	 Local Address		Foreign Address		State		PID
  TCP	 0.0.0.0:135		0.0.0.0:0		LISTENING	900
  RpcSs
 [svchost.exe]
  TCP	 0.0.0.0:3389		0.0.0.0:0		LISTENING	1448
  CryptSvc
 [svchost.exe]
  TCP	 0.0.0.0:49152		0.0.0.0:0		LISTENING	544
 [wininit.exe]
  TCP	 0.0.0.0:49376		0.0.0.0:0		LISTENING	608
 [services.exe]
  UDP	 0.0.0.0:68		0.0.0.0:0		LISTENING	988
  Dhcp
 [svchost.exe]
  ⋮

The equivalent Linux command with options is ps -ef.

The only thing missing here are the process' owners. To get that in Windows, we need another program called tasklist.exe. In an Administrator shell, run the following command:

tasklist.exe /v

Image Name               PID Session Name Session#    Mem Usage Status   User Name   	          CPU Time Window Title
====================== ===== ============ ======== ============ ======== ============================ ========= ============
System Idle Process        0 Services            0         24 K Unknown  NT AUTHORITY\SYSTEM          557:44:44 N/A
System                     4 Services            0      1,312 K Unknown  N/A                            4:41:28 N/A
smss.exe                 332 Services            0        492 K Unknown  NT AUTHORITY\SYSTEM            0:00:00 N/A
csrss.exe                476 Services            0      2,996 K Unknown  NT AUTHORITY\SYSTEM            0:00:37 N/A
wininit.exe              544 Services            0      1,368 K Unknown  NT AUTHORITY\SYSTEM            0:00:00 N/A
services.exe             608 Services            0     10,956 K Unknown  NT AUTHORITY\SYSTEM            0:08:48 N/A
lsass.exe                624 Services            0     14,820 K Unknown  NT AUTHORITY\SYSTEM	      	0:04:03 N/A      
⋮
svchost.exe              900 Services            0     10,132 K Unknown  NT AUTHORITY\NETWORK SERVICE   0:00:20 N/A      
svchost.exe              988 Services            0     29,776 K Unknown  NT AUTHORITY\LOCAL SERVICE     0:03:05 N/A      
⋮
svchost.exe             1448 Services            0     32,400 K Unknown  NT AUTHORITY\NETWORK SERVICE   0:20:43 N/A      
⋮

Now, with these two administrative tools, I can figure out which processes are providing network services (listening on a network socket) on my computer and who owns those processes. This is a big deal for me to understand my computers risk for intrusion. Think of it this way - if I wanted to protect my home from intruders, I would want to know where all of the possible entrances are so that I could seal them up. Obviously, I cannot seal them all since I need at least one entrance to get in and out and a window or two for fresh air, but fewer is more secure. The same notion applies to processes providing network services on computers - fewer is better. Unecessary services should be stopped and removed from computers to make them more secure from network attacks.

Leaving only essential entryways in my home is not the complete solution. Where those entryways are is also important. I am most vulnerable when I sleep; therefore, none of my entrances should lead directly into my bedroom. They should lead to other, less important, rooms so that my bedroom door can provide an extra layer of security. Processes that are running as the superuser are like entrances directly to my bedroom. If they are compromised, I become defenseless against harm from the intruder.