In this lesson we will talk about cryptography in the digital world. As a part of this, we'll look at some real-world cryptographic tools. Before we do anything, however, we need to recall the similarities and differences between our basic cryptographic tools (technically we call them cryptographic primitives): Symmetric / Secret-Key Encryption, Asymmetric / Public-Key Encryption, and Hashing.
|Symmetric / Secret-Key Encryption||Asymmetric / Public-Key Encryption||Hashing|
A hash value in the digital world has to be represented by bytes just like the data you're hashing is represented by bytes. So we just view a hash technique as a function that maps the bytes of the input into bytes representing a number, which is the hash value. The number of bytes in the hash value is usually fixed, regardless of how many bytes are in the input. We invariably write down hash values as sequences of hex digits that spell out the values of the bytes within the hash. So, for example, a 128-bit (16 byte) hash function would give us this picture:
+---------------+ N-byte input --> | Hash Function | --> 16-byte hash value +---------------+The idea is just the same as with the Rubik's cube, though. The hash function should be relatively easy to compute. Working backwards from a hash value to generate a string should be unfeasibly difficult. Strings that are even the smallest bit different should have wildly different hash values.
Encryption in the digital world is more complex. Recall how we were able to break the Vigenere Cipher. Because the Vigenere Cipher maps individual letters to individual letters, letter frequencies in the cyphertext could ultimately be related back to letter frequencies in the plaintext. This allowed us to break the cipher. Even though we encrypt bytes not letters in the digital world, frequency analysis still works — it would just be the frequencies of the 256 possible bytes rather than the 26 letters. So what we do is encrypt a block of bytes at a time (16 bytes in the symmetric encryption algorithm we'll look at). So instead of replacing each byte of plaintext with a corresponding byte of cyphertext, we replace a block of bytes in the plaintext with a block of bytes in the cyphertext. If the plain-text input doesn't come out to an even number of blocks, some padding (extra bits) needs to be added. We could tabulate frequency counts for the 256 bit patterns possible within a single byte, but we cannot tabulate frequency counts for the 340,282,366,920,938,463,463,374,607,431,768,211,456 bit patterns possible within a 16-byte block. The next issue is the key: the key is also digital data, so the key is ultimately a sequence of bytes. Once again, we usually write down a key by the hex representation of the sequence of bytes. A given encryption method must specify the block size and the key size — both are usually specified as a number of bits. So, with 128-bit (16 byte) key and block size we have a picture like this (N is the number of bytes in the plaintext, B is the number of blocks after padding):
+-----------------------------------+ | | | +---------+ | N-byte plaintext ---|-> pad to 16*B bytes ->| | | | | Encrypt |-|--> 16*B Byte ciphertext (B blocks) 16 byte key---------|---------------------->| | | | +---------+ | | | +-----------------------------------+
+-----------------------------------+ | | | +---------+ | | | |<--|--- 16*B Byte ciphertext (B blocks) N-byte plaintext <--|-- remove padding <--| Decrypt | | | | |<--|--- 16 Byte key | +---------+ | | | +-----------------------------------+Suppose you intercept a cyphertext message and you know that the original plaintext was truly a text file, i.e. each byte was the ASCII code of a printable character. Decrypting with the wrong key almost certainly yields something that is not all ASCII codes. So we can try the following attack: go through every possible key, decrypt a block with that key, and check whether we get a result that's all ASCII codes. This is called a brute force attack. (Basically, any attack that starts with "try all possible ..." is a form of brute force attack.) With a 128-bit key, there are 340,282,366,920,938,463,463,374,607,431,768,211,456 possible keys, so a brute force attack would probably take more time than any of us have left to live. Or than the sun has left to live, for that matter.
MD5 takes a sequence of bytes of any length and produces from it
a 128-bit (16 byte) hash value, which is invariably written as a
string of 32 hex digits. So, instead of a scrambled Rubik's
Cube serving as a hash, we have 128-bit number. The idea is
the same, though. It's really difficult to work backwards from
the hash value to a string (sequence of bytes in this case) that
produces that hash value. You can get the hash for a file named
foo on most UNIX systems with
and on Windows using the md5 program, like this:
$ md5sum foo 145e36441adae5f9478f4a6f2f79fe52 foo
$ md5sum let it be 292ff476b03d6296482780fb01c51904 -
$ md5sum let it ba cc2b3df5ac949b1ae6d0b5d613343f5d -First note: ctrl-d sends an End-Of-File in UNIX. That's what you use to signal the end of your input when you're typing in the string md5sum is supposed to hash. Second note: In UNIX it can be a pain to hash a string without a newline character at the end, since you kind of have to enter a newline before you hit ctrl-d. Here's a solution:
$ echo -n "let it be" | md5sum 292ff476b03d6296482780fb01c51904 -
C:\> md5 foo 145e36441adae5f9478f4a6f2f79fe52Since text is just a sequence of bytes (remember the ASCII table?) as far as the computer is concerned, you can get an MD5 hash for any text string just as easily:
C:\> md5 -d"let it be" 292ff476b03d6296482780fb01c51904The important thing to notice is that the tiniest change to the text leads to a completely different hash:
C:\> md5 -d"let it ba" cc2b3df5ac949b1ae6d0b5d613343f5dQuestion: Suppose you have a password file that looks like this:
username salt md5hash(salt+password) ... bjones k%W3?A1 c8d0c0fec386b9cd6a625b3c8e57c988 ...Which (if either) of these two passwords would get user bjones logged on?
In computer forensics, MD5 hashes have two important applications. First, when a piece data serving as evidence is collected, we compute the hash value of the data immediately. An investigator should never alter evidence in the course of his investigation, so the hash be at the end or middle of the investigation should be exactly as it was initially. Thus, the hash provides a method of showing that we didn't alter or tamper with the evidence. Naturally, we also work from a copy in case anything does get changed, so we can revert to the original data.
A second application of hashes is in looking for "bad" files, like child pornography. There is a national database of hashes of known child pornography images. Analysis tools can calculate the hashes of every file on a hard drive and cross reference the hash values of files found on the system with the hashes in the database. If there is a match, then the owner of the hard drive has some serious questions to answer.
Usually, AES is not a stand-alone tool. Instead, it is embedded in systems that use it for security. For example, the archiving tool PKZIP uses it to provide the option of creating a password protected .zip file. Windows offers BitLocker, which allows you to encrypt your hard disk. That way, if your computer, or disk, or backups are stolen, your data is still secure. Bitlocker uses AES. SSL traffic may use AES. IPSec, which is a standard for secure IP (Internet Protocol) traffic uses AES. Suffice it to say, there are many more examples.Although AES is usually embedded in other tools, we've provided you with a shell/command-line tool called
aesthat you can use as a stand-alone tool to encrypt and decrypt files and strings. You can use it to generate passwords (because coming up with random strings of 32 hex bits is tedious), to encrypt, and to decrypt.
|Generate a 128-bit key|
$ aes -k adccbb42e647d10370992205ddc268e6
$ aes -e adccbb42e647d10370992205ddc268e6 "Who's afraid of the big bad wolf?" d31cfef5689d1cf37b725934c8d314f9e57a26bbbeb528c0364ab4edd0819b70829cc4f22add97df1a866a4ef176c2c4
$ aes -d adccbb42e647d10370992205ddc268e6 d31cfef5689d1cf37b725934c8d314f9e57a26bbbeb528c0364ab4edd0819b70829cc4f22add97df1a866a4ef176c2c4 Who's afraid of the big bad wolf?
aestool also operates on files. You can give the
aes -hcommand for more information.
A real-world tool that provides RSA key-pair generation,
encryption and decryption is provided as part of
openssl library, which was included in your
software install. Here is a summary of some commands for using
RSA encryption with openssl:
openssl genrsa -out keypair.pem 2048
|←||Generate 2048-bit RSA keypair and store in file keypair.pem.|
openssl rsa -text -in keypair.pem
|←||View (in text and hex) the RSA keypair in file keypair.pem.|
openssl rsa -in keypair.pem -pubout -out pubkey.pem
|←||Extract the public key from keypair.pem and save in file pubkey.pem.|
openssl rsautl -encrypt -pubin -inkey pubkey.pem -in plain -out cipher
|←||Encrypt the file plain using the public key in file pubkey.pem, store the result in file cipher.|
openssl rsautl -decrypt -inkey keypair.pem -in cipher -out plain1
|←||Decrypt the file cipher using the private key in file keypair.pem, store the result in file plain1.|
I need a 128-bit number for AES encryption. I'd like to use a password/passphrase string instead, what could I do? Use the MD5 Hash of your passphrase as the AES key. In class we tried this out. You encrypted using this scheme and posted your message to the class message board. Then you told a friend your password/passphrase and they decrypted your posted message. The important lesson in this is that we often combine cryptographic tools to construct a system that meets our needs.
The problem with this scheme is that if you sign a very large file, the signature is very long. So in the real-world, Alice digitally signs a file by encrypting a hash of the file with her private key. If we have our own copy of the file, we can hash it ourselves, getting hash value D1, and then we can decrypt the signature with Alice's public key, getting value D2. If D1 and D2 are equal, we can be extremely confident that the file we have is really the same as the one Alice signed. The beauty of this is that no matter how large the file is, the hash of it is a fixed size (e.g. 128 bits for MD5). Thus, the digital signature always has the same (relatively) small size.
Openssl has tools that allow us to digitally sign files, and to verify digital signatures. Assuming we have the same files as produced in the openssl examples above, here are the commands for signing files and verifying signatures:
openssl dgst -sha1 -sign keypair.pem -out plain.sig plain
|←||Digitally sign file plain using the private key from file keypair.pem, using SHA1 as the hash algorithm, and storing the result in file plain.sig.|
openssl dgst -sha1 -verify pubkey.pem -signature plain.sig plain
|←||Verify digital signature (stored in file plain.sig) of the file plain, using the public key in file pubkey.pem and using SHA1 as the hash algorithm.|
Once again, the openssl command syntax is difficult, and you are not expected to remember it. However, you should remember that Alice digitally signs the file plain by first hashing plain, then encrypting the hash value with her private key. Because of this, a digital signature scheme must specify both a public-key encryption algorithm (e.g. 2048-bit RSA) and a hashing algorithm (e.g. SHA1).
There are some tools out there that actually deal with this nicely and, given what we know, it's not hard to see how we could do it ourselves. Let's assume we have one strong password G8mk#A92@woy*5 we can actually remember. When we create a new account, say an amazon.com account with username sy110, we take the md5 hash of
6ff21549882901fbba94305fff2338ec, and use that as our password. Every time we login again, we recompute the hash and supply that as our password. The only thing we have to remember is that one strong password. And we reuse it in creating hashes-as-passwords like this for all the sites we have accounts with. If we want even more security, just add salt!