Cryptographic Hash Functions

Compression

A (cryptographic) hash function works as follows:
  • Input: any arbitrary length data
  • Output: short, fixed-length digest.

Most outstanding examples include MD5, SHA-1, SHA-2, SHA-3:

the output length in bits
MD5 128 (16 bytes)
SHA-1 160 (20 bytes)
SHA-2 (SHA-256, SHA-384, SHA-512) 256 (32 bytes), 384, 512
SHA-3 224, 256, 384, 512

Nowadays, SHA-256 is the most popular.

Properties of Cryptographic Hash functions

Refer to the documentation for how to use the cryptographic hash functions in Python.

Collision Attacks on (Broken) Cryptographic Hash Functions

Birthday attack

Recall the birthday paradox:

If there are \( 20 \approx \sqrt{365} \) people, then with probability about half, there are two people with the same birthday.
Due to this paradox, the best guarantee against collision attacks that any cryptographic hash function with \( n \) bit output is \( \sqrt {2^n} = 2^{n/2} \), which means the attacker has to compute \( 2^{n/2} \) hashes.

In practice, this is still a huge number for a large \(n\). For example, with SHA-256, we have \(n=256\), and one needs to compute \(2^{128}\) hashes, which will take at least \(10^{20}\) trillion years.

Security of MD5

The security of the MD5 hash function is severely compromised. Don't use this. A collision attack exists that can find collisions within one minute. In 2012, according to Microsoft, the authors of the Flame malware used an MD5 collision to forge a Windows code-signing certificate. See this for more detail.

Security of SHA-1

Until recently, SHA-1 was the most commonly used hash algorithm all over the world. However, researchers from Google published a practical attack on SHA-1 called Shattered.

Consequently, web browsers and other software that relies on hashing for security moved to SHA-2, which is still believed to be secure. This is the most commonly used hash algorithm all over the world now, and there is no known practical attack so far, thereby with the practical security.

Password-based Authentication

Authentication is the act of proving an assertion, such as the identity of a computer system user.

Password-based authentication: storing hash of a password

Instead of storing the actual passwords, the server can store hash of them.

Other Applications

Benefiting from collision resistance and compression

If data \(x\) is really large, one can use its hash \(y = H(x)\) instead of using the actual x as its representation. Due to collision resistance, it is safe to assume that there is no other data \(z\) whose hash is also \(y\). So, it is safe to regard \(y\) as the short representation of \(x\). In this sense, we often call this \(y\) a fingerprint of \(x\).

This concept of using the hash as a short representation is quite useful:

Shell Commands

There are shell commands that compute the hash checksum of a file.
~$ md5sum two.pcap
6e76ea38b1791107ebef45ec4b99ec50  two.pcap
~$ sha1sum two.pcap
b96f81be3ecff92338bf4305f4edb157e680f162  two.pcap
~$ sha256sum two.pcap
d99ee086e193fa67f09b0e21836447f9d0741995f86608260460feffc7babf84  two.pcap