We've surveyed the workings of a single computer system — the physical machine, OS and programs. We've also looked at how the World Wide Web, a system that's comprised of millions of web servers and billions of browsers communicating with one another. The web is just one example of a system built on top of computer networks — the Internet, in particular. We will now turn our attention to how the Internet works and, later, to how wired and wireless networks work in general. This class will be our first step.

The Internet in 60 Seconds: Hosts, IP Addresses and Packets

Many of the things that could go wrong when passing notes in class go wrong on the internet too:
  1. notes (packets) might not make it to the recipient
  2. in long messages that have to be broken up into chunks to be sent separately, individual notes (packets) might arrive out of order
  3. your notes (packets) might be read in transit by malicious people
  4. the contents of the note (packet) may be mucked up in transit — think of a classmate who spills a drink, or has sweaty hands, or tries to mess with you by changing what's in the note.
A host is a computer (in the most general sense) connected to a network. On the Internet, hosts are identified by their IP Address. They may or may not have a name as well, but always an IP address. IP addresses are usually written down as four numbers with dots in between, like 146.145.5.67, though more on that later. Information travels around the internet like note passing in class. Suppose host A has a message to send to host B. A breaks his message into small chunks and sends each chunk, prefaced with B's address, across the network. This message chunk+address is called a packet. Each packet gets passed from one host to another (again, like a note passed in class) until it finally reaches B. The intermediate packet-passing hosts in this process are called routers. When passing notes in class, when a note gets passed to you, you can't usually hand it directly to the recipient, instead you can only hand it to one of your neighbors — and you choose a neighbor that gets the packet closer to the recipient. A router's neighbors are the hosts it has direct links to (e.g. connected by a cable). It uses the packet's IP address to choose a neighbor host that's "closer" to the recipient, and passes the packet on to that neighbor.
Placeholder

Following a packet's path

On Windows the traceroute command is tracert. However, since the protocol it uses is commonly restricted by organizations (including DREN) due to network security concerns, the Windows tracert command will not be able to report hops outside the USNA network.
There's a shell utility called traceroute that shows the path a single packet travels from a source host to a destination host on the Internet. The following output from traceroute shows the route from a host here in the cs department to a host (a web-server) in Austria.
bash$ traceroute -m 32 www.risc.uni-linz.ac.at
traceroute to www.risc.uni-linz.ac.at (193.170.37.138), 32 hops max, 60 byte packets
 1  michelson-3a-as1-v401.gw.usna.edu (10.53.33.1)  1.551 ms  1.942 ms  2.310 ms
 2  michelson-1a-ag1-v603.net.usna.edu (10.48.1.93)  0.739 ms  1.262 ms  1.659 ms
 3  yard-d2-v941.net.usna.edu (10.48.2.81)  0.560 ms  1.057 ms  0.999 ms
 4  usna-c1-v717.net.usna.edu (10.0.1.25)  0.937 ms  0.879 ms  0.822 ms
 5  border-d1-v712.net.usna.edu (10.0.1.6)  0.765 ms  1.179 ms  1.558 ms
 6  border-f1-gi1_0.net.usna.edu (131.122.6.249)  0.992 ms  0.907 ms  0.842 ms
 7  border-r1-po1.net.usna.edu (192.190.228.1)  1.360 ms  1.269 ms  1.180 ms
 8  dren-sdp.net.usna.edu (138.18.45.5)  1.583 ms  1.502 ms  1.412 ms
 9  so48-2-1-0.ray.dren.net (138.18.1.59)  5.346 ms  5.273 ms  6.880 ms
10  xe-0-0-0.100.dmz.ray.dren.net (138.18.49.26)  6.797 ms  7.199 ms  7.125 ms
11  POS1-1-1.GW8.DCA6.ALTER.NET (152.179.75.129)  7.525 ms  8.449 ms  8.317 ms
12  0.xe-2-0-0.XT2.DCA6.ALTER.NET (152.63.40.82)  7.697 ms  5.144 ms  5.065 ms
13  0.so-6-0-1.XL4.IAD8.ALTER.NET (152.63.36.209)  7.521 ms  7.419 ms  7.366 ms
14  0.ae4.BR2.IAD8.ALTER.NET (152.63.41.233)  7.290 ms  7.641 ms  7.568 ms
15  ae16.edge1.washingtondc12.level3.net (4.68.62.133)  24.039 ms ae17.edge1.washingtondc12.level3.net (4.68.62.137)  23.426 ms  23.357 ms
16  vl-3602-ve-226.ebr2.Washington12.Level3.net (4.69.158.38)  10.806 ms vl-3601-ve-225.ebr2.Washington12.Level3.net (4.69.158.34)  7.692 ms vl-3603-ve-227.ebr2.Washington12.Level3.net (4.69.158.42)  10.631 ms
17  ae-5-5.ebr2.Washington1.Level3.net (4.69.143.221)  8.451 ms  8.386 ms  8.582 ms
18  ae-43-43.ebr2.Paris1.Level3.net (4.69.137.57)  89.870 ms ae-41-41.ebr2.Paris1.Level3.net (4.69.137.49)  88.636 ms ae-44-44.ebr2.Paris1.Level3.net (4.69.137.61)  89.700 ms
19  ae-47-47.ebr1.Frankfurt1.Level3.net (4.69.143.141)  96.128 ms ae-48-48.ebr1.Frankfurt1.Level3.net (4.69.143.145)  97.434 ms ae-45-45.ebr1.Frankfurt1.Level3.net (4.69.143.133)  100.191 ms
20  ae-81-81.csw3.Frankfurt1.Level3.net (4.69.140.10)  104.739 ms  96.972 ms  95.129 ms
21  ae-72-72.ebr2.Frankfurt1.Level3.net (4.69.140.21)  95.776 ms ae-92-92.ebr2.Frankfurt1.Level3.net (4.69.140.29)  94.925 ms ae-82-82.ebr2.Frankfurt1.Level3.net (4.69.140.25)  96.122 ms
22  ae-1-12.bar1.Vienna1.Level3.net (4.69.153.145)  105.499 ms  105.792 ms  105.713 ms
23  ae-0-11.bar2.Vienna1.Level3.net (4.69.153.150)  103.852 ms  103.751 ms  103.674 ms
24  vlan301.wien21.aco.net (212.73.203.18)  103.525 ms  103.496 ms  105.748 ms
25  vlan73.wien21.aco.net (193.171.23.41)  103.606 ms  103.528 ms  104.717 ms
26  vlan312.linz2.aco.net (193.171.15.6)  106.348 ms  107.539 ms  106.164 ms
27  vlan313.linz1.aco.net (193.171.15.9)  106.998 ms  106.927 ms  108.240 ms
28  jku-gw.edvz.uni-linz.ac.at (193.171.22.26)  112.249 ms  109.642 ms  109.249 ms
29  jkuc3hb1.edvz.uni-linz.ac.at (140.78.200.225)  108.671 ms  109.123 ms  108.599 ms
30  Router.RISC.Uni-Linz.AC.AT (140.78.222.31)  112.525 ms  112.404 ms  112.371 ms
31  * * *
32  * * *

      
There's a lot to notice here. First off, I gave the name of the server instead of its IP address. Usually we use names, so I've shown it that way, but we could just as easily used its IP address instead. Notice that it takes more than 30 hops on its way! Because there were so many, I had to increase the max number of hops for traceroute to use to 32 (default is 30). But even after increasing the max number of hops, all I see after the university's router are asterisks. The problem here is that the university's firewall is preventing packets from reaching the traceroute program. To penetrate the firewall, I need to change the way traceroute behaves, so that its packets appear as legitimate ones. To do this, I need superuser privileges. Here is the command I ended up using to get the complete route:
bash$ sudo traceroute -m 32 -T -p 443 www.risc.uni-linz.ac.at

In either case, traceroute is able to provide us with names for the hosts along the way, not just IP addresses, and those names actually tell a story: the packet travels from usna.edu (that's us!) to dren.net (that's the organization that provides USNA its internet access) to Washington, then across the Atlantic Ocean to Frankfurt (in Germany) to Vienna (in Austria) to Wien (which is actually just the German name for Vienna) to Linz (another city in Austria) to uni-linz.ac.at, which is the University of Linz. You'll notice that in several places we have a few consecutive host with similar IP address. For example, there are three consecutive 193.171.x.x addresses. This tells a story too, though exactly what it means will have to wait. It's important to note that packets don't always follow the same route from the same host A to the same host B. Once again, it's like note passing: there are many possible ways to the same destination ... which sounds pretty philosophical, actually.

IP Addresses

An IP address is actually a 32-bit number. A 32-bit number can be viewed as four 8-bit pieces (four bytes) and we usually write IP addresses as these four 8-bit numbers, separated by dots. This is called a dotted quad. For example:
          146.145.5.67
       ___/    /   \  \____
      /       /     \      \
10010010.10010001.00000101.01000011 → 10010010100100100000010101000011 → 2458977603
The reason we prefer dotted quad notation is a) they're generally easier to remember, and b) the first two or three bytes is often the same for all IP addresses within a given organization. For example, CS Department IP addresses are all 131.122.88.* or 131.122.89.*. That similarity is harder to see when addresses are written out as a single number.
Enter an IP Address in either a dotted-quad, decimal, or binary and press <enter>:
Dotted-quad: Decimal: Binary:

In Unix this utility is called ifconfig.
You can determine the IP address of your Windows laptop using the utility ipconfig in the shell. (1. Open command prompt. 2. Type ipconfig /all) If you give the command ipconfig there will be a line in the output that looks like
IPv4 Address. . . . . . . . . : 131.122.88.124
and that, of course, is your IP address. It'll be really important throughout the course to be able to answer when someone says "what's your IP address?".

IPv4 vs IPv6

   Why only 32 bits ?

At this 2008 conference sponsored by Google, Vinton Cerf tells us why. Listen to the discussion at 13:00 - 14:35. Vinton Cerf, Project Director for the TCP/IP research program at DARPA in 1976, is at the podium. Bob Hinden, who is seated, helped develop the first Internet routers. [Embedded by permission of the YouTube Terms of Service]

The IP addresses that we talk about here are IPv4 (Internet Protocol version 4) addresses. With 32 bits, there are a little over four billion such addresses. When an organization wants to set up a network, they need to ask for some IP addresses to for webservers, e-mail servers, etc. There are agencies responsible for allocating blocks of addresses to organizations, and they'd say something like "you get block 131.122.88.1/24", which means you get all addresses whose first 24-bits (3 bytes) are the same as 131.122.88.1 — i.e. addresses of the form 131.122.88.*, where "*" means anything. That's why addresses within an organization usually share a common prefix. Check out this visualization of how /8 blocks (i.e. the blocks 1.*.*.*, 2.*.*.*, ..., 255.*.*.*) were allocated.

With "only" four billion (4,000,000,000) addresses, IPv4 is not going to last forever. There are lots of techniques that are used to deal with the problem of not having as many addresses as there are devices that want to be on the internet, but eventually we will simply run out. Over the last decade that the-end-is-near prediction has been made many times, so I won't hazard a guess as to when it'll happen, but it will eventually. So what then? There's a newer standard, IPv6, that uses 128-bit addresses. That gives about 3.4×1038 addresses, which ought to be enough for all eternity, right?


comics are from xkcd.com

We're all out of IPv4 Addresses

The command ipconfig asks your operating system to tell you what your host's IP Address is. There are also websites that tell you your IP Address, my favorite of which is ipchicken.com. If you check your IP Address with ipconfig you'll get something like 10.53.33.223. However, if you check it with ipchicken, you'll get something very different. What's going on here?!?!? Has everything we've told you been a lie?

What's going on here is that USNA has run out of IPv4 addresses! It only has 2,046 IP Addresses to use for the whole Academy, students, faculty, staff and all. Clearly that's not nearly enough, but for various reasons we're not ready to move to IPv6. So instead, the Academy uses a trick called Network Address Translation (NAT). We'll have to wait to understand NAT in much detail, for now, however, we can understand a little. First of all, addresses that start with "10.", i.e. address of the form 10.*.*.* are called private addresses. What that means is that no packet with such an address can leave your local network and be routed across the larger Internet. So when you send a packet out from your machine to, for example, ipchicken.com, the packet goes to a host with one of the 2,046 actual (non-private) IP Addresses USNA owns, and that host sends our packet out as if it came from him. The magic is that when ipchicken.com sends packets back to that host, the host realizes that the packet should actually go to your machine, and replaces its address in the packet with your machine's private address, and sends the packet along to your machine. The intermediary host is sending out packets on behalf of many USNA hosts at the same time, so it is a bit magical that it knows where to send the packets it receives back from the outside world. How the magic works we can only explain after a few more network lessons.

Domain Names

When communicating over the phone, we distinguish between a person's name and their phone number. In fact we only need the number to make a call. The name by itself isn't useful. On the other hand, the name is what we actually associate with the person. If your friend says "Who did you just call?", you say "Bill" not "410-293-9999". Phone numbers may change, but usually a person's name stays the same.

The situation on the internet is similar: what you need to communicate with another host is its IP address. But when we as people identify a host, it's with a name, like www.usna.edu. This kind of name is called a domain name.

There's actually lots to say about domain names, but we will leave it with a few short thoughts:

  1. Domain names are hierarchical, just like paths in a file system. The only difference is that we write them the other way 'round: www.usna.edu instead of /edu/usna/www. Thus, every machine here at the academy is something.usna.edu.
  2. The top of the hierarchy is called the root, and the root domain is the name ".".
  3. The next level down in the hierarchy is the name at the right end: .edu, .com, etc. These are called the top level domains. Check out this list of top-level domain names.
  4. The www at the front of a name, like www.usna.edu usually is meant to indicate a webserver, but having www at the front of its name doesn't make a machine a webserver any more than having the first name "Prince" makes you royalty.

DNS: The Phonebook of the Internet

Before you can communicate with another host on the internet, you need an IP address for it. However, we usually have a domain name, not an IP address. So we need to consult some kind of "phonebook" equivalent to get the IP address from the symbolic name. The irony is that the "phonebook" is another host on the Internet, and talking to it requires an IP address ... there's a whole chicken-egg thing here.

The "phonebook" of the internet is called DNS (Domain Name System). It consists of a global system of servers, called nameserver, that translate symbolic names to IP addresses either by knowing the answer, or passing the query along to a server that does. To translate a symbolic name to an IP address, you need to query a nameserver, which requires knowing the nameserver's IP address. If you only had the symbolic name of the nameserver, you'd be in trouble. However, when your computer joins a network, it is given the IP Address of one or more nameservers. You can see these addresses with the shell command ipconfig /all. Look for the line

DNS Servers. . . . . : 10.1.74.10
that contains the IP addresses of one or more nameservers.
In unix, you'd look at the file /etc/resolv.conf to find these addresses.

The nslookup utility is a shell tool (for both Windows and Unix) that will carry out a DNS request for you. Here's an example:

$ nslookup wasabi.academy.usna.edu
Server:		10.1.74.10
Address:	10.1.74.10#53

Name:	wasabi.academy.usna.edu
Address: 10.53.37.171
From this we see that the IP address of the host wasabi.academy.usna.edu is 10.53.37.171. Furthermore, the output is telling us that the nameserver that provided us this answer has IP address 10.1.74.10. The nslookup utility is also able to do reverse DNS requests — i.e. "here's an IP address, what's the name?". We can use that to find the name of the nameserver we just queried.

NIPRNet

"Nippernet" is the colloquial name for the DoD subset of the Internet that carries "sensitive but UNCLASSIFIED IP data".
Access to the NIPRNet is tightly controlled: all data crossing the NIPRNet/Internet boundary must pass through a DoD-owned router, and hosts on the NIPRNet resolve names using DNS servers operated by the DoD Network Information Center. (The DoD also owns and operates the SIPRNet, for CLASSIFIED data.)

$ nslookup 10.1.74.10
Server:		10.1.74.10
Address:	10.1.74.10#53

10.74.1.10.in-addr.arpa	name = ns1.usna.edu.
From this we see that the nameserver 10.1.74.10 has the name ns1.usna.edu.

Normally, nslookup will query the nameserver listed by the call to ipconfig /all to do DNS lookups. However, if you call nslookup with a second argument that is the name or IP address of a nameserver, nslookup will query that nameserver instead. So, for example:

$ nslookup wasabi.academy.usna.edu old-ns1.netmgmt.usna.edu
Server:		old-ns1.netmgmt.usna.edu
Address:	131.122.220.1#53

Name:	wasabi.academy.usna.edu
Address: 10.53.37.171
... actually causes my PC to contact old-ns1.netmgmt.usna.edu to resolve the name wasabi.academy.usna.edu.

Name resolution in action

It's worthwhile thinking a bit about what happens when you send your browser to a website. When you enter http://www.martinguitar.com in your browser's address bar, the browser is supposed to send a request to the werbserver www.martinguitar.com (specifically an HTTP GET request). But that can't happen until the browser finds out what IP address goes with that name. In fact, you can enter the IP address directly into the browser's address bar, like this
http://146.145.5.67
... and you'll get the website. If you use the symbolic name, however, the browser first makes a DNS request to a nameserver to get the IP address for the name www.martinguitar.com, and then actually sends the HTTP GET request to the webserver.

It's worth remembering that you can enter IP addresses directly into the browser's address bar. If there's no nameserver, the nameserver is down, or you don't trust the nameserver, this is a useful trick. This is often used in setting up networks.