In this lesson we will talk about cryptography in the digital world. As a part of this, we'll look at some real-world cryptographic tools. Before we do anything, however, we need to recall the similarities and differences between our basic cryptographic tools (technically we call them cryptographic primitives): Symmetric / Secret-Key Encryption, Asymmetric / Publi-Key Encryption, and Hashing.

Cryptography in the digital world

A hash value in the digital world has to be represented by bytes just like the data you're hashing is represented by bytes. So we just view a hash technique as a function that maps the bytes of the input into bytes representing a number, which is the hash value. The number of bytes in the hash value is usually fixed, regardless of how many bytes are in the input. We invariably write down hash values as sequences of hex digits that spell out the values of the bytes within the hash. So, for example, a 128-bit (16 byte) hash function would give us this picture:

Encryption in the digital world is more complex. Recall how we were able to break the Vigenere Cipher. Because the Vigenere Cipher maps individual letters to individual letters, letter frequencies in the cyphertext could ultimately be related back to letter frequencies in the plaintext. This allowed us to break the cipher. Even though we encrypt bytes not letters in the digital world, frequency analysis still works — it would just be the frequencies of the 256 possible bytes rather than the 26 letters. So what we do is encrypt a block of bytes at a time (16 bytes in the symmetric encryption algorithm we'll look at). So instead of replacing each byte of plaintext with a corresponding byte of cyphertext, we replace a block of bytes in the plaintext with a block of bytes in the cyphertext. If the plain-text input doesn't come out to an even number of blocks, some padding (extra bits) needs to be added. We could tabulate frequency counts for the 256 bit patterns possible within a single byte, but we cannot tabulate frequency counts for the 340,282,366,920,938,463,463,374,607,431,768,211,456 bit patterns possible within a 16-byte block. The next issue is the key: the key is also digital data, so the key is ultimately a sequence of bytes. Once again, we usually write down a key by the hex representation of the sequence of bytes. A given encryption method must specify the block size and the key size — both are usually specified as a number of bits. So, with 128-bit (16 byte) key and block size we have a picture like this (N is the number of bytes in the plaintext, B is the number of blocks after padding):

United States Cyber Command

The mission of U.S. Cyber Command is:

"USCYBERCOM plans, coordinates, integrates, synchronizes and conducts activities to: direct the operations and defense of specified Department of Defense information networks and; prepare to, and when directed, conduct full spectrum military cyberspace operations in order to enable actions in all domains, ensure US/Allied freedom of action in cyberspace and deny the same to our adversaries."

If you look carefully at the command's logo (shown above, click to see full size) you'll find that it includes the MD5 hash of their mission statement (not including the quotation marks!). Even in the military, geek humor lives!

Real World Hashing: MD5 Hash

MD5 takes a sequence of bytes of any length and produces from it a 128-bit (16 byte) hash value, which is invariably written as a string of 32 hex digits. So, instead of a scrambled Rubic's Cube serveing as a hash, we have 128-bit number. The idea is the same, though. It's really difficult to work backwards from the hash value to a string (sequence of bytes in this case) that produces that hash value. You can get the hash for a file named foo on most unix systems with md5sum, and on Windows using the md5 program, like this:

In Unix ...

$ md5sum foo
145e36441adae5f9478f4a6f2f79fe52  foo

$ md5sum
let it be
292ff476b03d6296482780fb01c51904  -

$ md5sum
let it ba
cc2b3df5ac949b1ae6d0b5d613343f5d  -

First note: ctrl-d sends an End-Of-File in Unix. That's what you use to signal the end of your input when you're typing in the string md5sum is suppoesd to hash. Second note: In Unix it can be a pain to hash a string without a newline character at the end, since you kind of have to enter a newline before you hit ctrl-d. Here's a solution:

$ echo -n "let it be" | md5sum
292ff476b03d6296482780fb01c51904  -

In computer forensics, MD5 hashes have two important applications. First, when a piece data serving as evidence is collected, we compute the hash value of the data immediately. An investigator should never alter evidence in the course of his investigation, so the hash be at the end or middle of the investigation should be exactly as it was initially. Thus, the hash provides a method of showing that we didn't alter or tamper with the evidence. Naturally, we also work from a copy in case anything does get changed, so we can revert to the original data.

A second application of hashes is in looking for "bad" files, like child pornography. There is a national database of hashes of known child pornography images. Analysis tools can calculate the hashes of every file on a hard drive and cross reference the hash values of files found on the system with the hashes in the database. If there is a match, then the owner of the hard drive has some serious questions to answer.

Real-World Symmetric (Secret Key) Encryption: AES (Advanced Encryption Standard)

Usually, AES is not a stand-alone tool. Instead, it is embedded in systems that use it for security. For example, the archiving tool PKZIP uses it to provide the option of creating a password protected .zip file. Windows offers BitLocker, which allows you to encrypt your hard disk. That way, if your computer, or disk, or backups are stolen, your data is still secure. Bitlocker uses AES. SSL traffic may use AES. IPSec, which is a standard for secure IP (Internet Protocol) traffic uses AES. Suffice it to say, there are many more examples.

Precision targeting for cyber weapons
In the Summer of 2012, a new piece of "malware" (malicious software, about which much will be said later!) was discovered, and given the name Gauss (Wired article). Experts speculate it was created by either the U.S. or Israel as a "cyber weapon". Mysteriously, nobody knows what it was supposed to do, because the instructions it will follow are encrypted and decrypt themselves only when they are on the targeted hosts. How that was done is really interesting, and it's a good example of combining crypto tools to accomplish a new goal. Gauss collects certain pieces of information about the host it's running on — things like the host's OS, network interfaces, servers it's associated with — computes the MD5 hash of that information, and uses the hash value as the secret key for decrypting the secret instructions. If the host information doesn't match exactly, the MD5 sum will be wrong, and the decryption process will produce nonsense, and the malware won't have instructions to follow. The crypto ensures (1) that Gauss only attacks the specified target, and (2) that if Gauss is discovered (as it has been), nobody will know what it ultimately is supposed to do!

Real-World Asymmetric (Public-Key) Encryption: RSA

A real-world tool that provides RSA key-pair generation, encryption and decryption is provided as part of the openssl library, which was included in your software install. Here is a summary of some commands for using RSA encryption with openssl:

Combining Crypto Tools: Passphrase encryption

I need a 128-bit number for AES encryption. I'd like to use a password/passphrase string instead, what could I do? Use the MD5 Hash of your passphrase as the AES key. In class we tried this out. You encrypted using this scheme and posted your message to the class message board. Then you told a friend your password/passphrase and they decrypted your posted message. The important lesson in this is that we often combine cryptographic tools to construct a system that meets our needs.

Combining Crypto Tools: Digital Signatures

The problem with this scheme is that if you sign a very large file, the signature is very long. So in the real-world, Alice digitally signs a file by encrypting a hash of the file with her private key. If we have our own copy of the file, we can hash it ourselves, getting hash value D1, and then we can decrypt the signature with Alice's public key, getting value D2. If D1 and D2 are equal, we can be extremely confident that the file we have is really the same as the one Alice signed. The beauty of this is that no matter how large the file is, the hash of it is a fixed size (e.g. 128 bits for MD5). Thus, the digital signature always has the same (relatively) small size.

Openssl has tools that allow us to digitally sign files, and to verify digital signatures. Assuming we have the same files as produced in the openssl examples above, here are the commands for signing files and verifying signatures:

Once again, the openssl command syntax is difficult, and you are not expected to remember it. However, you should remember that Alice digitally signs the file plain by first hashing plain, then encrypting the hash value with her private key. Because of this, a digital signature scheme must specify both a public-key encryption algorithm (e.g. 2048-bit RSA) and a hashing algorithm (e.g. SHA1).

Combining Crypto Tools: Password reuse and systems for one password for all accounts

There are some tools out there that actually deal with this nicely and, given what we know, it's not hard to see how we could do it ourselves. Let's assume we have one strong password G8mk#A92@woy*5 we can actually remember. When we create a new account, say an amazon.com account with username si110, we take the md5 hash of

openssl genrsa -out keypair.pem 2048	←	Generate 2048-bit RSA keypair and store in file keypair.pem.
openssl rsa -text -in keypair.pem	←	View (in text and hex) the RSA keypair in file keypair.pem.
openssl rsa -in keypair.pem -pubout -out pubkey.pem	←	Extract the public key from keypair.pem and save in file pubkey.pem.
openssl rsautl -encrypt -pubin -inkey pubkey.pem -in plain -out cipher	←	Encrypt the file plain using the public key in file pubkey.pem, store the result in file cipher.
openssl rsautl -decrypt -inkey keypair.pem -in cipher -out plain1	←	Decrypt the file dipher using the private key in file keypair.pem, store the result in file plain1.

openssl dgst -sha1 -sign keypair.pem -out plain.sig plain	←	Digitally sign file plain using the private key from file keypair.pem, using SHA1 as the hash algorithm, and storing the result in file plain.sig.
openssl dgst -sha1 -verify pubkey.pem -signature plain.sig plain	←	Verify digital signature (stored in file plain.sig) of the file plain, using the public key in file pubkey.pem and using SHA1 as the hash algorithm.

Symmetric / Secret-Key Encryption	Asymmetric / Public-Key Encryption	Hashing