There are a common set of tools that help provide confidentiality, integrity, authentication, and non-repudiation that fall under the umbrella of cryptography. Over this and the next several lessons we will learn about several categories of techniques (symmetric encryption, asymmetric encryption, cryptographic hashing, and steganography), we will use and understand simple examples of those techniques, and we will learn about and use real-world tools that make use of those techniques. We will also look at attacks on these techniques.

What is Cryptography

Cryptography is the study and practice of hiding information. There are three fundamental areas that we'll look at: These tools can be used in isolation or combination to provide the IA pillar properties.

Caesar Shift: a Simple Encryption Method

Every introduction to cryptography starts with the Caesar Shift Cipher, and who am I to buck tradition. To set the scene, we have two communicating parties: Alice and Bob. Alice is sending a message and Bob is receiving. Then we have eavesdropper, Eve, who wants to know what's in the message. Alice's original message is called the plaintext. She wants to scramble the message to produce what we call the ciphertext, which should be unintelligible to Eve, but easily unscrambled by Bob. The scrambling process is called encrypting, the unscrambling called decrypting.

The Caesar Shift Cipher assumes your message is all capital letters, and replaces each letter in the plaintext with a new letter to produce the ciphertext. The replacement scheme is based on secret key that Alice and Bob have agreed upon ahead of time — a number in the range 0-26 called the shift value. The replacement scheme is simple: if the shift value is s, the kth letter in the alphabet is replaced by letter k+s in the alphabet, circling back around to the front of the alphabet if necessary. So with a shift value of 3, the letter B (the 2nd letter in the alphabet) is replaced with the letter E (the letter number 2+3 = 5 in the alphabet). You can use the little applet below to help encrypt a message once you've chosen a shift value.

Shift Value

Decrypting is means subtracting rather than adding the shift value, although you might notice that a shift value of 26-s actually reverses a shift of s. Let's follow this process through from start to finish:

  1. At some earlier time, Alice and Bob agree to a secret key/shift-value k = 11.
  2. Alice decides to send Bob the secret message "MEET ME AT NOON", i.e. plaintext = MEET ME AT NOON.
  3. Alice encrypts the plaintext (using something like the table above) with key k=11 to get: ciphertext = XPPE XP LE YZZY.
  4. Alice sends Bob the ciphertext.
  5. Eve manages to read the message in transit, but since she reads the ciphertext, XPPE XP LE YZZY, she can't make sense of it. Not knowing the key, she can't decrypt it to recover the plaintext.
  6. Bob receives the ciphertext and decrypts it using something like the table above, in reverse, with key k=11, recovering the plaintext MEET ME AT NOON.

Although very simple and, as we'll see, not very secure, the Caesar Cipher is a good example. It has the basic properties of any cryptosystem: two communicating parties Alice/Bob, nefarious eavesdropper Eve, plaintex/ciphertext, encryption/decryption. Moreover, it's representative of one of the two basic classes of cryptosystem, symmetric encryption (also called secret-key), where there is a secret key, shared by both Alice and Bob, that is used to encrypt and decrypt the message.

Encryption Key Management


Much of military communications are encrypted today for obvious reasons. What is not so obvious is how the Navy and Marine Corps manage all of the encryption keys used for encrypted communications. The system used throughout the military is called the Electronic Key Management System (EKMS) and is centrally controlled by the National Security Agency (NSA). EKMS is in place to provide communications security (COMSEC) material (i.e. encryption keys) and support tools for tracking and managing encryption key material, generation, distribution, and accounting.

Sound like an important job? It is and you might be the one doing it at your command as a junior officer. Every Naval or Marine unit that uses secure communications has at least two EKMS managers and it is common practice to have a junior officer act as one of them.

Frequency Analysis — breaking the Caesar cipher

Not all letters get used with the same frequency in English. E's get used all the time, whereas Z's are not very common. One kind of cryptographic attack, i.e. a way to foil a cryptosystem so you can read secret messages, is based on analyzing the frequencies of letters in the ciphertext to get information about what key value produced that ciphertext. If you can deduce the key, you can decrypt the message (crack the code).

Let's suppose you are Eve, and you've intercepted the message (ciphertext) XPPE XP LE YZZY. There are more P's than anything else, so you might guess (correctly in this case) that a P in the ciphertext came from an E in the plaintext. This would lead you to guess the key/shift-value k = 11.

It's not always going to be that easy of course. The ciphertext RNCP KU QHH has more H's than anything else. If we assume that H's in the ciphertext came from E's in the plaintext, we would deduce a key/shift-value of 3. Decrypting assuming k = 3 gives OKZM HR NEE ... which is probably not the secret message. In fact, the plaintext that produced this message was PLAN IS OFF.

The problem with this approach is that that we only considered one letter — the most common appearing in the ciphertext. Assuming H's came from E's gave us lots of E's in our "cracked" message, but it also gave us Z's and K's, which are pretty uncommon. To do frequency analysis properly, we should consider all the letters in the message. This is tedious, of course, but when something is tedious, it just means that we ought to write a program and let the computer do it for us. Try out this page which features a Javascript program for cracking Caesar shift encryption via frequency analysis. It functions by calculating for each shift value the likelihood of that shift value being correct based on the frequencies of the letters that result from decrypting the given ciphertext with that shift value. It's very interesting to see how few characters of ciphertext are required to recover the key with a high degree of certainty.

So we see that the Caesar Shift Cipher is not very secure. In particular, it's quite vulnerable to attack via frequency analysis. Its problems are a) there are only 26 key values, so trying them all is a viable option, and b) since a given character in the plaintext is always replaced with the same character in the ciphertext, letter frequencies carry over from plaintext to ciphertext.

More Sophisticated Symmetric Encryption: The Vigenere Cipher

.ABCDEFGHIJKLMNOPQRSTUVWXYZ
AABCDEFGHIJKLMNOPQRSTUVWXYZ
BBCDEFGHIJKLMNOPQRSTUVWXYZA
CCDEFGHIJKLMNOPQRSTUVWXYZAB
DDEFGHIJKLMNOPQRSTUVWXYZABC
EEFGHIJKLMNOPQRSTUVWXYZABCD
FFGHIJKLMNOPQRSTUVWXYZABCDE
GGHIJKLMNOPQRSTUVWXYZABCDEF
HHIJKLMNOPQRSTUVWXYZABCDEFG
IIJKLMNOPQRSTUVWXYZABCDEFGH
JJKLMNOPQRSTUVWXYZABCDEFGHI
KKLMNOPQRSTUVWXYZABCDEFGHIJ
LLMNOPQRSTUVWXYZABCDEFGHIJK
MMNOPQRSTUVWXYZABCDEFGHIJKL
NNOPQRSTUVWXYZABCDEFGHIJKLM
OOPQRSTUVWXYZABCDEFGHIJKLMN
PPQRSTUVWXYZABCDEFGHIJKLMNO
QQRSTUVWXYZABCDEFGHIJKLMNOP
RRSTUVWXYZABCDEFGHIJKLMNOPQ
SSTUVWXYZABCDEFGHIJKLMNOPQR
TTUVWXYZABCDEFGHIJKLMNOPQRS
UUVWXYZABCDEFGHIJKLMNOPQRST
VVWXYZABCDEFGHIJKLMNOPQRSTU
WWXYZABCDEFGHIJKLMNOPQRSTUV
XXYZABCDEFGHIJKLMNOPQRSTUVW
YYZABCDEFGHIJKLMNOPQRSTUVWX
ZZABCDEFGHIJKLMNOPQRSTUVWXY
Next we'll consider a more sophisticated (and, in fairness, more recent) cryptosystem called the Viginere Cipher. It is a symmetric key encryption method, like the Caesar Cipher, but it addresses the problem of too few key possibilities and the carrying over of letter frequencies from plaintext to ciphertext.

The key is a string of letters like JOE. To encrypt, you take your plaintext (we'll reuse MEET ME AT NOON) and write it down. Then you write down th key string over the plaintext, with letters matching up. If the plaintext is longer than the key, you simply repeat the key. Like this:

JOEJ OE JO EJOE ← key (repeated as needed)
MEET ME AT NOON ← plaintext
	
Next you write down (or have on hand) the table you see on the right. The encrypted value a plaintext character is the table entry whose row is given by the plaintext character and whose column is given by the key character written above it. Thus, the first letter of our message encrypts to the table entry at row M and column J, which is a V. (Check out this demo to see the process on a short message.) Decryption is straightforward if you understand encryption: write down the ciphertext with the key written above it (repeated as needed). To decrypt a character in the ciphertext, identify the column given by the key character above the ciphertext character. Find the ciphertext character in that column. The row at which it appears is labeled with decrypt value, i.e. with the corresponding plaintext character.

Think about how the Vigenere Cipher addresses the flaws in the Caesar Shift. The key is a string of characters, and since there are roughly 6 trillion strings of length less than 10, for instance, the problem of too few keys has been addressed. The same letter at different positions in the plaintext generally does not get mapped to the same character in the ciphertext, since the key-character written above plays a role in the encryption. So letter frequencies in the plaintext do not get carried over to the ciphertext.

Frequency Analysis Attacks on the Vigenere Cipher & the One Time Pad

The Venona Project: Poor Practice Defeats Perfect Security
One-time pads provide provably perfect security ... but at a price. Managing keys is really difficult! After all, you have to have as many bytes of key as you have bytes of plaintext to communicate. During WWII, the British and US intercepted a large amount of Soviet Russian communication that was encrypted with one-time pad encryption. However, cryptanalysis revealed that some of the one-time pad key had been reused ... which is the big no-no with one-time pad encryption. This misuse of the system allowed small parts of the communication to be decrypted. NSA's effort to exploit this misuse of one-time pad keys to decrypt as much as possible of the traffic was code-named VENONA. Over the years (Venona lasted until 1980), this code-breaking effort revealed Soviet espionage campaigns and spys at places like Los Alamos Nationa Labs, the State Department and the White House. It identified the Rosenbergs and Alger Hiss as spies. With any cryptographic protocol, even a small deviation from the protocol can compromise security. This fact should be a major take-away from the story of the Venona project!
Cyber Pigeons?
Before there was the Internet, there were pigeons. In late 2012, a British man found a dead carrier pigeon in his chimney. Turns out it was from WWII and it carried an encrypted message tied to its leg. CNN has a nice story UK spies unable to crack coded message from WWII carrier pigeon about it and the fact that nobody's been able to decrypt the message. Turns out, the sender used a one-time pad.
The Vigenere Cipher is still susceptible to a frequency analysis attack. First of all, you have to convince yourself that if the key length is n, what we see if we restrict ourselves to every nth character of the plaintext/ciphertext is simply a Caesar Shift. The shift value is simply the first key character's position in the alphabet (A=0, B=1, etc). Starting from the second position and restricting to every nth character gives us a shift value corresponding to the second character in the key, and so on. Since we can crack Caesar Shift (given enough characters), we can crack each of these "every nth character" problems and recover the key. Then we can decrypt just as easily as the recipient.

Finding the key length can be a problem, but one easy way given what we already know is this: for each possible key length n, form the string consisting of every nth character starting from the first, give that as a ciphertext input to our Caesar Shift Frequency Analysis page, and make a note of the probability of the shift index it gave you for that n. Whichever n value gave us the highest score is probably the actual length of the key. In class, we will actually have performed this exercise.

This kind of attack requires enough text that our Caesar Shift frequency analysis of every nth character finds the proper shift index with high probability. If the message length is L, and we assume we need about 20 characters to be assured of having a high probability with our Caesar Shift frequency analysis, we'd like to have L/n > 20. If L is short or n is long, our attack will fail. So, in general, a longer key gives you more security from frequency analysis. If you have a key that is a completely random sequence of letters, and which is as long or longer than the message, the Vigenere Cipher is unbreakable — provided you never use the key again. In this situation, the system becomes what is called a one-time pad. The problem with such a system is that arranging to have this one huge key is difficult.

Chosen Plaintext Attack & the Vigenere Cipher

A kind of chosen plaintext attack was done by the US during WWII. We knew a Japanese attack was imminent because we had cracked a code, but we didn't know whether the string designating the target was referring to Hawaii or Midway. So we leaked a story about a water shortage on Midway, and discovered that same symbol in a message that was, we were sure, relaying that leaked information.

The program that cracked WEP-encryption in your wireless lab is actually also based on a chosen plaintext attack.

The thing about the Vigenere and Caesar Shift Ciphers is that there are three strings — the key, the plaintext and the ciphertext — and knowing any two is enough to get the third. As an attacker, i.e. as "Eve", you know the ciphertext. If you haven't got that, you've got nothing. What if you knew the message? What if you could induce Alice to send a specific message to Bob? Or at least a message that contains some text you know. For instance, if Bob is a spy, and I leak to him that I'm planning an attack on Albuquerque, I can guess that the next message he sends will contain the word "Albuquerque" somewhere. So, suppose Bob goes and sends the message:
JZFDEYNFUDS MB KLNFI CVIH KMUZ ECHELY
We'll assume that "Albuquerque" is in fact the first word, so we have
???????????
ALBUQUERQUE
↓↓↓↓↓↓↓↓↓↓↓
JZFDEYNFUDS
We work through to recover the key like this: row A has a J in the J-column, so J is the first letter of the key. Row L has a Z in the O-column, so we have an O as the second letter, and so on. In this way we recover JOEJOEJOEJO and deduce that the key was JOE.