In this lesson we will talk about cryptography in the digital world. As a part of this, we'll look at some real-world cryptographic tools. Before we do anything, however, we need to recall the similarities and differences between our basic cryptographic tools (technically we call them cryptographic primitives): Symmetric / Secret-Key Encryption, Asymmetric / Public-Key Encryption, and Hashing.

Symmetric / Secret-Key Encryption Asymmetric / Public-Key Encryption Hashing

Cryptography in the digital world

In cyberspace, our messages are not text-only. A message can be any piece of digital data. However, we know that fundamentally any digital data is just a sequence of bytes. So instead of encrypting or hashing sequences of characters, we encrypt or hash sequences of bytes. Caesar shift, Vigenere Cipher, Rubik's Cube Hash ... all of these could be modified to work on bytes 00000000 to 11111111 instead of letters A to Z. However, there are much better symmetric encryption algorithms and hash algorithms available to us, so we'll move on to look at them — though, obviously, we won't be able to understand them to the same depth.

A hash value in the digital world has to be represented by bytes just like the data you're hashing is represented by bytes. So we just view a hash technique as a function that maps the bytes of the input into bytes representing a number, which is the hash value. The number of bytes in the hash value is usually fixed, regardless of how many bytes are in the input. We invariably write down hash values as sequences of hex digits that spell out the values of the bytes within the hash. So, for example, a 128-bit (16 byte) hash function would give us this picture:

                 +---------------+
N-byte input --> | Hash Function | --> 16-byte hash value
                 +---------------+
	
The idea is just the same as with the Rubik's cube, though. The hash function should be relatively easy to compute. Working backwards from a hash value to generate a string should be unfeasibly difficult. Strings that are even the smallest bit different should have wildly different hash values.

Encryption in the digital world is more complex. Recall how we were able to break the Vigenere Cipher. Because the Vigenere Cipher maps individual letters to individual letters, letter frequencies in the cyphertext could ultimately be related back to letter frequencies in the plaintext. This allowed us to break the cipher. Even though we encrypt bytes not letters in the digital world, frequency analysis still works — it would just be the frequencies of the 256 possible bytes rather than the 26 letters. So what we do is encrypt a block of bytes at a time (16 bytes in the symmetric encryption algorithm we'll look at). So instead of replacing each byte of plaintext with a corresponding byte of cyphertext, we replace a block of bytes in the plaintext with a block of bytes in the cyphertext. If the plain-text input doesn't come out to an even number of blocks, some padding (extra bits) needs to be added. We could tabulate frequency counts for the 256 bit patterns possible within a single byte, but we cannot tabulate frequency counts for the 340,282,366,920,938,463,463,374,607,431,768,211,456 bit patterns possible within a 16-byte block. The next issue is the key: the key is also digital data, so the key is ultimately a sequence of bytes. Once again, we usually write down a key by the hex representation of the sequence of bytes. A given encryption method must specify the block size and the key size — both are usually specified as a number of bits. So, with 128-bit (16 byte) key and block size we have a picture like this (N is the number of bytes in the plaintext, B is the number of blocks after padding):

                    +-----------------------------------+
                    |                                   |
                    |                       +---------+ |
N-byte plaintext ---|-> pad to 16*B bytes ->|         | |
                    |                       | Encrypt |-|--> 16*B Byte ciphertext (B blocks)
16 byte key---------|---------------------->|         | |
                    |                       +---------+ |
                    |                                   |
                    +-----------------------------------+
                    +-----------------------------------+
                    |                                   |
                    |                     +---------+   |
                    |                     |         |<--|--- 16*B Byte ciphertext (B blocks)
N-byte plaintext <--|-- remove padding <--| Decrypt |   |
                    |                     |         |<--|--- 16 Byte key
                    |                     +---------+   |
                    |                                   |
                    +-----------------------------------+
Suppose you intercept a cyphertext message and you know that the original plaintext was truly a text file, i.e. each byte was the ASCII code of a printable character. Decrypting with the wrong key almost certainly yields something that is not all ASCII codes. So we can try the following attack: go through every possible key, decrypt a block with that key, and check whether we get a result that's all ASCII codes. This is called a brute force attack. (Basically, any attack that starts with "try all possible ..." is a form of brute force attack.) With a 128-bit key, there are 340,282,366,920,938,463,463,374,607,431,768,211,456 possible keys, so a brute force attack would probably take more time than any of us have left to live. Or than the sun has left to live, for that matter.

United States Cyber Command


The mission of U.S. Cyber Command is:
"USCYBERCOM plans, coordinates, integrates, synchronizes and conducts activities to: direct the operations and defense of specified Department of Defense information networks and; prepare to, and when directed, conduct full spectrum military cyberspace operations in order to enable actions in all domains, ensure US/Allied freedom of action in cyberspace and deny the same to our adversaries."
If you look carefully at the command's logo (shown above, click to see full size) you'll find that it includes the MD5 hash of their mission statement (not including the quotation marks!). Even in the military, geek humor lives!

Real World Hashing: MD5 Hash

MD5 is a widely used cryptographic hash function. Actually, it's been "broken", in the sense that people have discovered how to construct different strings that hash to a given hash value. Given that, we should be using something else like SHA2 (which is a collection of hash algorithms, by the way). However, the smaller size of MD5 and its ubiquity make it a good choice as our example of a real-world cryptographic hash function for digital data.

MD5 takes a sequence of bytes of any length and produces from it a 128-bit (16 byte) hash value, which is invariably written as a string of 32 hex digits. So, instead of a scrambled Rubik's Cube serving as a hash, we have 128-bit number. The idea is the same, though. It's really difficult to work backwards from the hash value to a string (sequence of bytes in this case) that produces that hash value. You can get the hash for a file named foo on most UNIX systems with md5sum, and on Windows using the md5 program, like this:

In UNIX ...
$ md5sum foo
145e36441adae5f9478f4a6f2f79fe52  foo
$ md5sum
let it be
292ff476b03d6296482780fb01c51904  -
$ md5sum
let it ba
cc2b3df5ac949b1ae6d0b5d613343f5d  -
First note: ctrl-d sends an End-Of-File in UNIX. That's what you use to signal the end of your input when you're typing in the string md5sum is supposed to hash. Second note: In UNIX it can be a pain to hash a string without a newline character at the end, since you kind of have to enter a newline before you hit ctrl-d. Here's a solution:
$ echo -n "let it be" | md5sum
292ff476b03d6296482780fb01c51904  -
C:\> md5 foo
145e36441adae5f9478f4a6f2f79fe52
Since text is just a sequence of bytes (remember the ASCII table?) as far as the computer is concerned, you can get an MD5 hash for any text string just as easily:
C:\> md5 -d"let it be"
292ff476b03d6296482780fb01c51904
The important thing to notice is that the tiniest change to the text leads to a completely different hash:
C:\> md5 -d"let it ba"
cc2b3df5ac949b1ae6d0b5d613343f5d

Question: Suppose you have a password file that looks like this:
username   salt      md5hash(salt+password)
...
bjones     k%W3?A1   c8d0c0fec386b9cd6a625b3c8e57c988
...
Which (if either) of these two passwords would get user bjones logged on? StartMeUp81 -or- LetItBl33d
You should be able to use the md5 utility (or md5sum on UNIX) to verify or unverify each of these.

In computer forensics, MD5 hashes have two important applications. First, when a piece data serving as evidence is collected, we compute the hash value of the data immediately. An investigator should never alter evidence in the course of his investigation, so the hash be at the end or middle of the investigation should be exactly as it was initially. Thus, the hash provides a method of showing that we didn't alter or tamper with the evidence. Naturally, we also work from a copy in case anything does get changed, so we can revert to the original data.

A second application of hashes is in looking for "bad" files, like child pornography. There is a national database of hashes of known child pornography images. Analysis tools can calculate the hashes of every file on a hard drive and cross reference the hash values of files found on the system with the hashes in the database. If there is a match, then the owner of the hard drive has some serious questions to answer.

Real-World Symmetric (Secret Key) Encryption: AES (Advanced Encryption Standard)

A stick figure guide to AES
Check out the stick figure guide to AES which tells you as much or as little as you want to know about AES, how it came to be and how it works. Plus, it's pretty funny.
AES (Advanced Encryption Standard) is a symmetric key (i.e. there is a single, shared secret key) encryption algorithm for encrypting digital data. It is a 128-bit block cipher, i.e., it always operates on 128 input bits at a time, although it has several variants with different key sizes. The variant we'll talk about uses 128-bit keys. It's widely used, and it has been approved by the National Security Agency (NSA) for encrypting top secret information.

Usually, AES is not a stand-alone tool. Instead, it is embedded in systems that use it for security. For example, the archiving tool PKZIP uses it to provide the option of creating a password protected .zip file. Windows offers BitLocker, which allows you to encrypt your hard disk. That way, if your computer, or disk, or backups are stolen, your data is still secure. Bitlocker uses AES. SSL traffic may use AES. IPSec, which is a standard for secure IP (Internet Protocol) traffic uses AES. Suffice it to say, there are many more examples.

Although AES is usually embedded in other tools, we've provided you with a shell/command-line tool called aes that you can use as a stand-alone tool to encrypt and decrypt files and strings. You can use it to generate passwords (because coming up with random strings of 32 hex bits is tedious), to encrypt, and to decrypt.
Generate a 128-bit key
$ aes -k
adccbb42e647d10370992205ddc268e6
Encrypt
$ aes -e adccbb42e647d10370992205ddc268e6 "Who's afraid of the big bad wolf?"
d31cfef5689d1cf37b725934c8d314f9e57a26bbbeb528c0364ab4edd0819b70829cc4f22add97df1a866a4ef176c2c4
Decrypt
$ aes -d adccbb42e647d10370992205ddc268e6 d31cfef5689d1cf37b725934c8d314f9e57a26bbbeb528c0364ab4edd0819b70829cc4f22add97df1a866a4ef176c2c4
Who's afraid of the big bad wolf?
Identify the key, plaintext and ciphertext.
The aes tool also operates on files. You can give the aes -h command for more information.

Precision targeting for cyber weapons
In the Summer of 2012, a new piece of "malware" (malicious software, about which much will be said later!) was discovered, and given the name Gauss (Wired article). Experts speculate it was created by either the U.S. or Israel as a "cyber weapon". Mysteriously, nobody knows what it was supposed to do, because the instructions it will follow are encrypted and decrypt themselves only when they are on the targeted hosts. How that was done is really interesting, and it's a good example of combining crypto tools to accomplish a new goal. Gauss collects certain pieces of information about the host it's running on — things like the host's OS, network interfaces, servers it's associated with — computes the MD5 hash of that information, and uses the hash value as the secret key for decrypting the secret instructions. If the host information doesn't match exactly, the MD5 sum will be wrong, and the decryption process will produce nonsense, and the malware won't have instructions to follow. The crypto ensures (1) that Gauss only attacks the specified target, and (2) that if Gauss is discovered (as it has been), nobody will know what it ultimately is supposed to do!

Real-World Asymmetric (Public-Key) Encryption: RSA

RSA Asymmetric / Public-Key Encryption, which we've already covered, is a real-world tool. In fact, it is the most commonly used public-key encryption technique. With RSA you have to worry about key-size and block-size, as we've seen for symmetric (secret key) encryption in the digital world. However, RSA keys have to be much larger that AES keys in order to provide the same level of security. Recall that an RSA key-pair looks like (e,n),(d,n), where e, d, and n are numbers ... very large numbers. When we talk about key size with RSA, we're really referring to the number of bits in n. Today, RSA keys or 2048 or 4096 bits are usually considered secure, and anything less is suspect.

A real-world tool that provides RSA key-pair generation, encryption and decryption is provided as part of the openssl library, which was included in your software install. Here is a summary of some commands for using RSA encryption with openssl:

openssl genrsa -out keypair.pem 2048
Generate 2048-bit RSA keypair and store in file keypair.pem.
openssl rsa -text -in keypair.pem
View (in text and hex) the RSA keypair in file keypair.pem.
openssl rsa -in keypair.pem -pubout -out pubkey.pem
Extract the public key from keypair.pem and save in file pubkey.pem.
openssl rsautl -encrypt -pubin -inkey pubkey.pem -in plain -out cipher
Encrypt the file plain using the public key in file pubkey.pem, store the result in file cipher.
openssl rsautl -decrypt -inkey keypair.pem -in cipher -out plain1
Decrypt the file cipher using the private key in file keypair.pem, store the result in file plain1.

Obviously, the syntax of these openssl commands is a bit daunting. You are not expected to remember it. What you should keep in mind, is that openssl provides the exact same functionality as our online RSA demo page, it just does it from the command-line, and with much bigger numbers (which means more security!).

Combining Crypto Tools: Passphrase encryption

A 128-bit AES key, even represented as a hex string, is hard to remember. You may prefer to use a password or passphrase instead when using symmetric (secret key) encryption. Magically, you don't need a new tool to do this. Instead, we can combine the cryptographic operations we already know in order to provide this new functionality.

I need a 128-bit number for AES encryption. I'd like to use a password/passphrase string instead, what could I do? Use the MD5 Hash of your passphrase as the AES key. In class we tried this out. You encrypted using this scheme and posted your message to the class message board. Then you told a friend your password/passphrase and they decrypted your posted message. The important lesson in this is that we often combine cryptographic tools to construct a system that meets our needs.

Combining Crypto Tools: Digital Signatures

We talked a bit about Digital Signatures in the RSA lesson. We pointed out that if we have a chunk of bytes that decrypts into something meaningful using Alice's public key, we know that Alice must've produced that chunk of bytes. Why? Because encrypting something in such a way as to have it decrypt meaningfully with Alice's public key requires Alice's private key. And only Alice should have access to her private key.

The problem with this scheme is that if you sign a very large file, the signature is very long. So in the real-world, Alice digitally signs a file by encrypting a hash of the file with her private key. If we have our own copy of the file, we can hash it ourselves, getting hash value D1, and then we can decrypt the signature with Alice's public key, getting value D2. If D1 and D2 are equal, we can be extremely confident that the file we have is really the same as the one Alice signed. The beauty of this is that no matter how large the file is, the hash of it is a fixed size (e.g. 128 bits for MD5). Thus, the digital signature always has the same (relatively) small size.

Openssl has tools that allow us to digitally sign files, and to verify digital signatures. Assuming we have the same files as produced in the openssl examples above, here are the commands for signing files and verifying signatures:

openssl dgst -sha1 -sign keypair.pem -out plain.sig plain
Digitally sign file plain using the private key from file keypair.pem, using SHA1 as the hash algorithm, and storing the result in file plain.sig.
openssl dgst -sha1 -verify pubkey.pem -signature plain.sig plain
Verify digital signature (stored in file plain.sig) of the file plain, using the public key in file pubkey.pem and using SHA1 as the hash algorithm.

Once again, the openssl command syntax is difficult, and you are not expected to remember it. However, you should remember that Alice digitally signs the file plain by first hashing plain, then encrypting the hash value with her private key. Because of this, a digital signature scheme must specify both a public-key encryption algorithm (e.g. 2048-bit RSA) and a hashing algorithm (e.g. SHA1).

Combining Crypto Tools: Password reuse and systems for one password for all accounts

A 2007 study of Web users by Microsoft Research (see the article Too Many Passwords or Not Enough Brainpower?) found that the average user has about 25 different accounts that require passwords. How do we deal with remembering all of these? We cheat and reuse passwords! The same study found that a given password gets used by the typical user for about four different accounts. This has its own dangers — an unscrupulous website could take the password you type and see if it logs them in to any of your other accounts — but it points to the problem that we cannot remember enough good passwords for all of our accounts.

There are some tools out there that actually deal with this nicely and, given what we know, it's not hard to see how we could do it ourselves. Let's assume we have one strong password G8mk#A92@woy*5 we can actually remember. When we create a new account, say an amazon.com account with username sy110, we take the md5 hash of

amazon.com+sy110+G8mk#A92@woy*5
which is 6ff21549882901fbba94305fff2338ec, and use that as our password. Every time we login again, we recompute the hash and supply that as our password. The only thing we have to remember is that one strong password. And we reuse it in creating hashes-as-passwords like this for all the sites we have accounts with. If we want even more security, just add salt!