The previous lesson introduced you to symmetric (secret key)
encryption. Symmetric encryption is, quite clearly, a tool for
providing confidentiality (as in the IA pillar).
In this lesson, we look at cryptographic hashes, which
are tools for guaranteeing integrity.
Hashes are used in a variety of ways, including password
authentication, which we'll talk a bit about below.
Hashes allow us to check the integrity of a message, i.e. check
that the message is the same as an earlier message, without
remembering the message itself — indeed without ever
having seen the message before. In the setting of passwords,
this means that we can check a password without ever storing a
copy of the password, which is important for security.
Thus, even though what hashing provides directly is a way to
check integrity, it is used to help provide authentication.
In fact, hashing is also used to help provide non-repudiation.
Suppose you wanted to prove to someone — but someone you
that you are psychic.
You might, predict the winner of next season's American Idol.
But if you
your prediction now, he might
place a bet and make a lot of money, which is bad
because you hate him. On the other hand, if you
your prediction after the season, he won't believe you're psychic. (Nor would I!)
What you need is
him something before-hand, something
a) that gives no information before or during the season about your
prediction, and b) that after the season provides indisputable evidence
that you predicted the real winner (or, if you failed, that you didn't).
It's worth thinking a bit about this example. Hashing will
provide a guarantee before hand of the integrity of
the message you'll send afterwards. Integrity,
that's an IA pillar!
Essentially, what we need is for you to be able to
send a message with
your prediction after the season, but you
must have sent something before the season that will
provide a guarantee of the integrity of the message
you send afterwards — i.e. something that will guarantee
that you haven't modified the message since before the
American Idol season started.
There is a cryptographic technique called hashing
that provides this "something", this guarantee of integrity.
Here's a simple scheme would allow us to prove our psychic-ness.
At the right you see a table with some letters and punctuation
marks. Next to each of them is a sequence of two colored
We write down the name of the predicted winner, and we
replace each character with the squares from its table
entry. Then, we take an solved Rubik's Cube and go through the
sequence of squares in order, interpreting each square as a
quarter-turn clockwise twist of the cube-face with that color as
its center square. When we're done, we give the messed up
Rubik's cube to "the guy". The messed up Rubik's Cube is
the hash of the predicted winner's name.
Once the season is over, we give "the guy" the name of the winner we
predicted, and he can take a second solved Rubik's cube, go
through the moves for that string, and verify that what he gets
is the same as the Rubik's cube you gave him.
Barring some incredibly unlikely coincidence, no two people
would have names that hash to the same value, so "the guy" will
be convinced that the message (just a name) you gave him after
the fact is the same as the message you had hashed beforehand,
and thus he will be convinced that you
are really psychic. Hashing like this guarantees integrity.
You might object that for some people (and
certainly for computers) solving Rubik's Cube is easy.
We can address this in two ways: we could use a Rubik's 4
or 5 or 6-cube to up the difficulty or, more profoundly, we
could hash by running through the string forward then backward.
This second approach makes more work for us when we do the
hashing, but it means that solving a Rubik's cube is not
enough. You need to find a solution that looks like doing
one sequence of moves, then doing the same moves in reverse
(in fact it's a bit different than that)
and ending up with a solved cube. As far as I know, nobody
— not even computers — can do that fast!
So for this discussion we treat solving a Rubik's cube as if
it is a really hard problem, realizing that to make it truly
hard we just need to make this small change to the rules for
how the hash is computed. Let's call it Rubik's Cube Hash
How many moves to solve a Rubik's cube?
A group of researchers, using computer time donated to them
by Google, proved that "God's Number", the least number of
moves that suffices to solve any scrambled cube, is 20.
Read the details.
Can you solve a Rubik's cube while juggling?
What about our goal of not giving away the name of our
predicted winner prior to the end of the season?
Well, finding a string that hashes to the mixed up cube you
gave "the guy" means solving a Rubik's Cube, and that's hard!
This is the fundamental insight behind hashing: it's easy to
produce a hash from a string, but very hard to find a string that
produces the given hash. Not impossible, just hard to do fast
— like solving a Rubik's Cube, which is not impossible,
just hard to do fast. In this case, finding "a" string that
hashes to the same cube isn't enough, because you might find a
different string (probably not a person's name) that just
happens to hash to the same value.
As the American Idol season progresses,
the field of contestants gets small enough that "the guy", if he
has lots of time or has lots of friends (and Rubik's cubes)
helping him, could
figure out the name of the predicted winner from the hash without having to
solve a Rubik's Cube. He could simply get the Rubik's Hash of
every remaining contestant's name and find the one that matches the
cube you gave him. Because there are too few
possible messages (contestant names), you risk giving away your
prediction to "the guy".
This is called a dictionary attack, by the way. When
the pool of strings from which came the one that produced the hash
value is small enough, one can try them all and test for
matches. This pool of candidate strings is the "dictionary".
In situations like this, we can salt the string before
hashing. You could, for instance, choose several random letters
and add them to the front and the back of the name you've
predicted before you actually take the hash.
These random letters are the "salt".
Now, "the guy" can't try hashing each contestant's name, even
when it's down to just a few contestants,
because he'd have to try each name with each possible salt
value, and there way too many possibilities. After the race,
you present him
with a string like
and he can verify that you did indeed predict "John_Smith" to
win. The only fly in the ointment is this: "the guy" might
object that you could have different salt combinations for every
name that all hash to the same cube you gave him before
Is that feasible? Actually I don't know. It's
certainly much harder than solving a Rubik's Cube though!
So as long as we all accept that solving a Rubik's cube is
hard, "the guy" cannot reasonably imagine that you had messages
on hand that hash to the cube you gave him for more than one
Properties of a good cryptographic hash function
There are several properties that a good cryptographic hash
function should have.
It should be easy to compute the hash of a string.
If all you have is a hash value, it should be very hard to
find a string that hashes to that value. Our Rubik's Cube Hash
has this property (and the first property), at least if you accept Rubik's
cube solving as being difficult. The Rubik's Cube Hash version
2.0 certainly has this property.
It should be difficult to find two strings
with the same hash. Related to that, it should be hard to take
a hash value and a string that hashes to it, and engineer from
the first string another string that hashes to that same
value. It might not be obvious that someone could cause trouble if they
could do this, but what should be obvious is that if we can do
this, our hashing scheme is not capable of guaranteeing the
integrity of messages, and that's bad.
Our Rubik's hash, even version 2.0, doesn't have the third
property, because I could insert double commas anywhere I wanted
to and, since two commas means four quarter-turns of the same
face, it doesn't change the hash value. However, it would be
very hard to engineer a meaningful messages — with
different meanings — with the same hash. As discussed
earlier, finding a message that means you predicted "John_Smith"
and another message with the same hash value that means you
predicted "Tina_Jones" is really hard.
We'll look at a commonly used cryptographic hash function next
lesson, but for now it's worth knowing that "real" cryptographic
hashes are numbers, not scrambled Rubik's cubes.
If you contact a company/site to tell them you've forgotten
your password, and they can tell you what your password was
... they're storing your actual password and not the hash
(how do we know that?).
You should be indignant!
Do you think it never happens?
Check out this
New York Times article about
RockYou.com break-in. Hackers broke into
rockyou.com and stole files with the passwords of over 32
million RockYou.com users.
This was only possible because RockYou.com stored actual
passwords, not hashes-of-passwords.
[Here's a brief follow-up article.]
450k Yahoo Passwords Leaked (7/2012)
It's been a few years since the RockYou.com incident. Surely
no company would store passwords "in the clear",
i.e. unencrypted ... right? Wrong! In July 2012, attackers
used an SQL injection attack to steal user credentials from
about 450,000 yahoo.com accounts. Apparently, the database was
storing the actual passwords for those accounts, not hashes!
This blog post
has a breakdown of the most common passwords in the bunch.
Notice that 123456 is still on top! Yeah!
A very common application of hashing is password checking.
Suppose there's a server you login to from time to time.
It's undesirable to store your password on the server's
harddrive. There are tricks that might allow people to see that
file and, once they do, they'd have your password and be able to
login as you. So instead, we'd like the server to store
something that it can use to verify your password,
but which nobody can use to figure out your password.
Of course we use a hash. Servers have a special file (the
password file) that stores usernames and the hashes of passwords
When you login, you give your password, they hash it and check
against the entry for your username in the password file stored
on their harddrive. An important point to note with this system:
to login as user foo, I don't need to
have foo's password — I just need to know a string
with the same hash as foo's password.
Because people aren't very creative about choosing passwords,
there is a relatively small pool of likely passwords, which
means dictionary attacks are a problem. Choosing unlikely
passwords ameliorates this. Worse yet, people have collected
huge precomputed lists of passwords and their hash values.
These lists, called Rainbow Tables can be searched
Rainbow tables can be defeated using salt. Typically, a
server will store a user's login name, salt string (different
for each user, and randomly chosen when the account is
created), and the hash of salt+password. For a rainbow table
to be effective, it has to store the hash value for every
possible salt+password combination. So, using a 10-byte salt
for example, even if there are
a million passwords in the pool, there would be
entries in the rainbow table.
That's 1.2×1018 terabytes, for those of you
keeping track at home. That size of table is impractical.
Note that this doesn't help against a dictionary attack, since
someone who had the password file would also have each user's
salt. But it does slow down a dictionary attack, especially
in the common situation in which an attacker isn't trying to
break into a specific account, but is looking for any account
to break into.
Strength of passwords
There are two scenarios for cracking passwords:
1) someone tries to break into your account by
attempting actual logins with many different passwords
[online attack], and
2) someone steals the password file and tries to find a string
with the same hash as is stored in your entry in the password
file (if salt is used it's there in the password file)
These scenarios are different and require different strengths to
In the first setting, someone actually tries to login with many
different passwords. Most systems take a long time (in
computer terms) to respond to a login — especially an
incorrect login. This limits the effectiveness of this kind of
attack. For instance, if it takes 2 seconds to get an
"login failed" message, you can only try 30 passwords a minute.
If you had an 8-character password, even if all characters were
lowercase letters, if would take on average over 6,000 years to
login successfully in this scenario.
Moreover, unsuccessful login attempts are usually logged, so
that even a mildly vigilant system administrator would be aware
that someone was attempting to hack into the account after a
few minutes of trying.
Finally, many systems employ throttling, in which
the server starts introducing longer and longer delays as
the same user tries more and more logins within a brief
period. In fact, often systems lock an account if there are too many
unsuccessful attempts in a short period of time. With this
policy, an attacker couldn't possibly be successful unless he
had narrowed down the password to one of a very small number of
(USNA WiFi simply locks you out after just three unsuccessful
So, in this scenario even a relatively short
password will be secure provided it not guessable,
which usually means not a word or number or name, and certainly
not anything connected with you, like your old high school's
A Really Bad Password
The worst thing you can do is choose a password that's really
common. The the
RockYou.com break-in allows us all to see what some of
these are. The five most common passwords of the 32 million
that were stolen were:
So if you use one of those, you're making it really easy for
people who want to break into your account. Think it doesn't happen?
this Feb. 2012 msnbc story
about how a Syrian official's emails were stolen because his
password was 12345.
which has a fun discussion of popular PINs.
LinkedIn Hacked ... and they didn't use salt!
In the summer of 2012, the popular social media site
linkedin.com had account information stolen, including
password hashes. Shockingly, this very high-profile site was
not salting passwords! They changed that immediately, of
course, but it was a bit late at that point.
The ACM Queue article LinkedIn Password Leak: Salt Their Hide
has a good discussion of why this is so disastrous!
Number of passwords of length n is
kn, where k is the number of characters
the password came from. So if you restrict yourself to
lowercase letters, there are 26n passwords of length n.
If you draw from both uppercase and lower case, there are
If you draw from both uppercase, lower case and digits 0-9, there are
62n. And so on.
However, length is far more important than the size of the
character set you draw from. There are 138,000 times more
passwords of length 12 using only lower case, than passwords
of length 6 using all possible printable characters.
In the second scenario, things are a lot more dire. It's good
to remember, however, that we're only in the second scenario if
something else has already gone wrong — i.e. if someone
has stolen the password file.
Because the attacker has his own copy of the password file, he
doesn't have to try logging into a real system to check whether
he's found the password. He simply hashes his guessed password
and checks if it matches a hash in the file. This means he can
check millions — potentially billions — of
passwords per second.
There are three basic attacks here: brute force (try every
string of length 1, then 2, then 3 ...), a dictionary attack,
and rainbow table attacks. Long passwords that include lower and upper
case letters, numbers and punctuation characters are most
secure. What's also important is that your password is not a
real word or names and, most of all, don't choose things
like iloveyou1, that about a million other people
also use as a password.
As we've seen, salting passwords provides some security in
the face of an offline attack, since precomputed tables
can't be used. It's also possible to use a site-wide salt
value, so that what's actually stored in the password file
is the hash of password + user-salt + site-salt.
This way an attacker has to steal the site-salt
value as well as the password file. Finally, a site can
employ password stretching. The idea behind
password stretching is to make the computation of a single
hash very time-consuming. That way, brute force attacks take
a whole lot longer. One way to do this is to store not the
hash of the password+salt, but the hash of the hash of the
hash of the hash of ... of the hash of the password+salt.
If you use the 10,000-times iterated hash, and attacker has
to wait 10,000 times longer for a password to be cracked.
If you like the xkcd-style passwords (as described in the comic to
the right), you can use the si110 xkcd-style password generator.
You can check out
graphic for some tips — you'll have to click to expand it
in order to make it readable.
I'm not completely sold on everything it says, but it certainly
gives you plenty of ideas of what not to do!
The SI110 Password Activity
In class, you went through an activity in which you stole the
password file for the class message board, and read off usernames
This prompted the instructor to turn on hashing of passwords for the
message board. When you went back and stole the password file
again, all you found were usernames and hashes-of-passwords. This
prompted us to try to crack the hashes to recover the
passswords. We had the RockYou.com dictionary of passwords, with
precomputed MD5 hashes for each of them. This way, we could crack
many passwords quite quickly. This prompted the instructor to turn
on salting of passwords. That way the precomputed hashes couldn't
be used. This slowed down our attack quite a bit, but still some
passwords could be cracked. Finally, we talked about what made for
secure (hard to crack passwords).
We ended up with the following observation: both the website and the
user have responsibilities in protecting identities with passwords.
Here is our summary:
Two factor authentication
Dropbox hacked ... 2-factor authentication coming
In July 2012, hackers managed to break into the account of an
employee at dropbox.com, a well-known "cloud storage" site,
using a stolen password.
This gave them access they should not have had — access
to dropbox.com user information, for example.
Dropbox.com has decided to offer a two-factor authentication
option (using your cell-phone as the second factor) to add
another layer of defense against this kind of attack. Check
It's hard to remember a long password, especially if it's a good
one with lowercase, uppercase, etc. Thus passwords may be
broken, and of course they may be stolen in various ways.
One approach to stronger authentication is
two-factor authentication. The idea is that to authenticate (for
instance to login to a system) you have to provide two pieces of
information: something you know (a password, usually), and
something you have/are. A smart card would be a
"something you have", a fingerprint would be an example of a
"something you are".
The security here is that an attacker would have to both
have your password and the other factor — your smart card,
or finger print or whatever it might be.
An interesting idea that google uses for
gmail (optionally) is that when you initiate login, a
text-message is sent to your cell phone with a secret 1-time
number. You enter that number along with your password to
authenticate. Essentially the cell phone (something you have)
is acting as the second factor in this two-factor authentication
How Apple and Amazon Security Flaws Led to My Epic Hacking
Mat Honan wrote an interesting
in Wired about his digital life getting hacked. This story
highlights a number of really important lessons:
security is holistic — whether Mr. Honan
chose a strong password, or whether apple used salt &
hashing didn't matter ... because the protocol apple uses for
authenticating a person asking for a password to be reset is
security is ... well, it's holistic —
Mr. Honan's Twitter account (the goal of the attack) was
secure enough in isolation, but nothing's isolated. There
were connections to other accounts of his, that were
connected to other accounts, and in the context of all these
connections, his Twitter account was not secure.
you don't have to outrun the bear —
Mr. Honan states that, if he'd used two-factor authentication
for his gmail account, this probably wouldn't have happened.
Why? Because the attacker would have simply turned to an
back stuff up
Both Apple and Amazon have changed policies in
response to this article/incident.
Password Cracking in the Navy
Navy commands routinely run password cracking software on their
own password files to ensure users are using strong passwords.
Specifically, the Navy uses L0phtCrack,
a program that can crack Windows and UNIX passwords. L0phtCrack
can be configured to use dictionary attacks (like the attacks we
performed during the class activities). Results from the routine
password cracks are reported to and monitored by upper levels of
the chain of command.
Security means never standing still
To keep their information systems secure, the Navy has to
constantly update and upgrade not only their hardware and
software, but even the cryptographic algorithms the hardware and
MD5 was the standard for a long time, but as researchers started
to discover flaws (or even potential flaws), the Navy
had to migrate to a newer, better hashing algorithm, which was
SHA1. Even that is now less secure than it was a few years ago.
covering the Department of the Navy's announcement that systems
will be migrating (gradually changing) to a still stronger
cryptographic hashing algorithm: SHA-256.