Overview of Cryptography and One-time Pad

Cryptography

Cryptography is the study of techniques for secure communication in the presence of adversarial behavior. In particular, cryptography is concerned with various aspects in computer security such as

data confidentiality, data integrity, authentication and non-repudiation

Two settings

We usually consider two participants, Alice and Bob. In this course, we will mainly deal with two settings.

Private key setting (or symmetric key setting).
- (Symmetric) Alice and Bob create and share the same key.
- (Private) This key is private against the other parties.
Public key setting (or asymmetric key setting).
- Alice creates a pair of key. She keeps one part (called the secret key or private key) secret and publish the other part (called the public key) to the public.
- Bob does the same. He keeps his secret key private and publishes his public key.

Cryptographic tools we will study

Roughly speaking, depending on the goal and setting, we will need a different cryptographic tool. The table below summarizes this.

	private key	public key
data confidentiality	private key encryption	public key encryption
data integerity	message authentication code	digital signature

Private Key Encryption

A private-key encryption scheme is defined by specifying the following:

A message space \( \cal M \), which defines the set of "legal" messages, i.e., those supported by the scheme

Three algorithms \( ({\sf Gen}, {\sf Enc}, {\sf Dec}) \), which are a procedure for generating keys, a procedure for encrypting, and a procedure for decrypting.

The key-generation algorithm \( \sf Gen \) is a randomized algorithm that outputs a key \( K \). We denote by \( K \leftarrow {\sf Gen} \) for the operation of executing \( \sf Gen \) with \( K \) being the returned key. Let \( ~\cal K \) denote the set of all possible keys output by the key generation algorithm; we also often call this the key space.
The encryption algorithm \( \sf Enc \) takes as input a key \( K \) and a message \( M \) and outputs a ciphertext \( C \) . We denote by \( C \leftarrow {\sf Enc}_K(M) \) the encryption of the plaintext \( M \) using the key \( K \) .
The decryption algorithm \( \sf Dec \) takes as input a key \( K \) and a ciphertext \( C \) and outputs a plaintext \( M \) . We denote by \( M \leftarrow {\sf Dec}_K(C) \) the decryption of the ciphertext \( C \) using the key \( K \) .

Correctness

The scheme is said to provide correct decryption if for any key \( K \in {\cal K} \) and any message \( M \in {\cal M} \), it holds that
\( {\sf Dec}_K ( {\sf Enc}_K ( M ) ) = M \).

Threat model

Situational attack capabilities

Security of encryption needs to specify what kind of adversary (or the capability of the adversary) it considers.

Passive attack: It simply performs eavesdropping the communication channel. In our context, the adversary will simply be given a ciphertext and try to figure out what the underlying plaintext it contains.
Known plaintext attack (i.e., adversary with plaintext-ciphertext pairs): Sometimes, the adversary may have additional information; for example, some encrypted documents (encrypted under the same key) may be leaked. This additional information may help the adversary crack the target ciphertext.
Chosen plaintext attack: In a more serious situation, the adversary may be able to choose a plaintext and get the corresponding ciphertext. This sounds like an unrealistic scenario, but it's not.

Historic Chosen-Plaintext Attack: World War II

In May 1942, US Navy cryptanalysts had discovered that Japan was planning an attack on Midway island in the Central Pacific. They had learned this by intercepting a communication message containing the ciphertext fragment "AF" that they believed corresponded to the plaintext "Midway island". Unfortunately, their attempts to convince Washington planners that this was indeed the case were futile; the general belief was that Midway could not possibly be the target. The Navy cryptanalysts then devised the following plan.
They instructed the US forces at Midway to send a plaintext message that their freshwater supplies were low. The Japanese intercepted this message and immediately reported to their superiors that "AF" was low on water.
The Navy cryptanalysts now had their proof that AF was indeed Midway, and the US forces dispatched three aircraft carriers to the location. The result is that Midway was saved, and the Japanese incurred great losses. It is even said that this battle was the turning point in the war by the US against Japan in the Pacific.
The Navy cryptanalysts here carried out exactly a chosen-plaintext attack. They essentially were able to "request" the Japanese to encrypt the word "Midway".
Chosen ciphertext attack: In an ultimately serious scenario, the attacker may have both encryption AND decryption capability. In particular, given a target ciphertext to crack, the adversary may be able to cook up some ciphertexts similar to the target ciphertext and obtain the decryption of these cooked-up ciphertexts! Obviously, there is no encryption scheme whatsoever that is secure against an adversary that decrypts the target ciphertext. In this sense, this is a threat model that considers the highest level of adversary's capability. Again, one may think that this is really an unrealistic scenario, but it's not!
Historic Chosen-Cipher Attack: Padding-oracle attack
As we will see later in this course, you often need to add a padding at the end of the plaintext so that the length of the plaintext is always a multiple of the length of what's called a block.
Let's think about the decryption algorithm. Naturally, if there is something wrong with the padding, the decryption should output error. This is actually 1-bit information about the decryption. This attack uses this "partial decryption" device.
- Slightly change the last part of the ciphertext over multiple trials. Almost all the time, the server outputs "decryption failure". However, if the server is OK, the adversary can gain information about the last byte of the plaintext.
- The adversary can get the second last byte of the plaintext similarly, and eventually the adversary can infer the entire plaintext.
The original attack was published in 2002 by Serge Vaudenay. Concrete instantiations of the attack were later realised against SSL and IPSec.
While these earlier attacks were fixed by most TLS implementors following its public announcement, several variant attacks appeared until recently. Examples include Lucky Thirteen attack (2013), POODLE (2014), FREAK (2015) and Logjam (2019).

Computational power of the adversary

Informational-theoretic security: If an encryption scheme is information-theoretic secure, it means no information whatsoever about the plaintext is revealed. So, even if the adversary has an "unlimited amount of computation power", it cannot break the scheme!

Computational security: An encryption scheme with computational security is more practical security notion. Of course, if the adversary has an unlimited computation power, the scheme will be broken. However, if a scheme is computationally secure, even in the practical extreme situation (e.g., think of what NSA has, maybe?), the scheme will remain secure for more than 1000 years.

For example consider the following code. As you see below, simply incrementing an integer for \(2^{80}\) times (with a single CPU) takes more than 200 million years.


import time

# measuring the time for incrementing an integer for 2**28 times
start = time.time()
i = 0;
for _ in range(2**28): 
    i += 1;
end = time.time()
elapsed = end - start; 

print("i=", i)
print(f"elapsed time: {elapsed} seconds")

# estimating the time based on the measure above: 2**80, 2**128, 2**256
security_param = [80, 128, 256]
for k in security_param:
    sec = 2**k / 2**28 * elapsed
    yr = sec/(3600*24*365)
    print(f"2^{k} would take {yr} years")

i= 268435456
elapsed time: 15.930447101593018 seconds
2^80 would take 2274998593.054912 years
2^128 would take 6.403551759969065e+23 years
2^256 would take 2.179015749583015e+62 years

Security notion: Indistinguishability

Conceptually, we want each ciphertext to reveal nothing about the underlying plaintext. How to mathematically model this concept is a bit tricky but interesting. Here's the rough idea:

Give the adversary time to inspect the encryption scheme.
Let the adversary choose two plaintexts that will be most easily crackable from his perspective.
Pick one of the two messages at random and encrypt it.
Give the resulting ciphertext to the adversary.
Have the adversary guess which message has been encrypted.
The ciphertext reveals nothing about the plaintext, the adversary can be successfully only by random guessing, which is exactly probability 1/2.
To capture this (relaxing it only slightly), if the probability that the adversary makes the good guess is 1/2 + negligible, then the encryption scheme is called indistinguishable. Here, negligible is something like \(\frac{1}{2^{80}}\)

We say that the encryption scheme is indistinguishable if for every possible adversary, the probability that the adversary makes a good guess in the above experiment is 1/2 + negligible. If the probability is exactly 1/2 for all possible adversary, then we call the scheme perfectly indistinguishable.

Warm-up about XOR

What we learned

\(a \oplus 0 = a \)
\(a \oplus 1 = \bar a \)

What about \( a \oplus a \) = ? (Drag the mouse for the answer below)

It is 0. To see this, we have
 - 0 xor 0 = 0
 - 1 xor 1 = 0

One-Time Pad Encryption

One-bit OTP

The one-time pad encryption scheme for encryption a single bit is defined as follows.

\( {\sf Gen}() \): Choose one bit at random. The key \(K\) is the chosen bit.
\( {\sf Enc}_K(M) \): Given a one-bit key \( K \), and a one-bit message \( M \), the encryption algorithm outputs \( C = M \oplus K \).
That is, depending on the key, the ciphertext is the flip of the message or the message itself.
\( {\sf Dec}_K(C) \): Given a key \( K \), and a ciphertext \(C\), the decryption algorithm outputs \( M = C \oplus K \).
That is, depending on the key, the decryption is performed by flipping the ciphertext or just returning the ciphertext.
Note: \[ {\sf Dec}_K( {\sf Enc}_K (M) ) = {\sf Dec}_K( M \oplus K ) = M \oplus K \oplus K = M \oplus 0 = M \]

The one-bit OTP is shown in the table (drag your mouse for the answer).

	M=0	M=1
K=0	0	1
K=1	1	0

One-bit OTP is perfectly indistinguishable under a passive attack

Any smart adversary would start with choosing 0 and 1 as the two messages. (A dumb adversary would choose the same bit, say 0 and 0 for the two messages, and it will have no additional information to smartly guess what \(b\) is.)

For example, suppose that the ciphertext \(C\) that adversary is given is 0.

Is it possible that the message is 0?
```
Yes, if K is 0. 
```
Is it possible that the message is 1?
```
Yes, if K is 1. 
```
OK, both are possible scenarios. How likely is it that K is 0? How likely is it that K is 1?
```
We know that K is randomly chosen. Therefore, Pr[K=0] = Pr[K=1] = 1/2. 
```
In summary, the uncertainty of the key is transferred to the uncertainty of the message. That is, keys being equally likely means that the messages are equally likely. This is an information theoretic argument, and this one-bit OTP is perfectly indistinguishable.

Extending the one-bit OTP to a multi-bit scenario

Fix an integer \( \ell \). The message space \( \cal M \) and key space \( \cal K \) are all equal to \( \{0,1\}^\ell \).

\( \{0,1\}^\ell \) is the set of all \(\ell\)-bits. For example \[ \{0,1\}^2 = \{00, 01, 10, 11\} \]

\( {\sf Gen}() \): Choose a key \( K \) from \( \{0,1\}^\ell \) at random.
\( {\sf Enc}_K(M) \): Given a key \( K \), and a message \( M \in \{0, 1\}^\ell \), the encryption algorithm outputs \( C = M \oplus K \).
\( {\sf Dec}_K(C) \): Given a key \( K \), and a ciphertext \( C \in \{0,1\}^\ell \), the decryption algorithm outputs \( M = C \oplus K\).

Note that in this scheme, only a single message is ever encrypted under one key.

Example: OTP when \( \ell = 2 \)

Drag the mouse for answers.

	M=00	M=01	M=10	M=11
K=00	00	01	10	11
K=01	01	00	11	10
K=10	10	11	00	01
K=11	11	10	01	00

Theorem

The one-time pad encryption scheme is perfectly indistinguishable under a passive attack.

Limitations of OTP

You need a long key. If you encrypt a 1GB long message, you need a 1GB long key. Sometimes, that's not practical.
Perfect indistinguishability holds if the key is used only once. This means that there is only one ciphertext encrypted under this same key. By definition, this rules out the scenario of known-plaintext attacks, chosen-plaintext attacks, and chosen-ciphertext attacks. So, we are good.
If the key is reused to encrypt different messages, now we have multiple ciphertext encrypted under the same key. So, we could consider known-plaintext attacks, chosen-plaintext attacks, and chosen-ciphertext attacks. Indistingiushability of OTP is completely broken even under the known-plaintext attack (which is the weakest among the aforementioned attacks).
In particular, if the adversary knows a pair of plaintext and ciphertext say \( (M, C) \), it can immediately recover the key by computing \( M \oplus C = M \oplus (M \oplus K) = (M \oplus M) \oplus K = 0 \oplus K = K\).

Choosing Random Bytes in Python

You can use Cryptodome package in Python to choose random bytes. The package can be installed as follows:

sudo apt-get install build-essential python3-dev
pip3 install pycryptodomex --user

Now, see the sample run.

The input to function get_random_bytes specifies how many bytes you need. In the above, each time the function returned 10 random bytes.


>>> from Cryptodome.Random import get_random_bytes
>>> get_random_bytes(10)
b'\xb4\x81\x03\xdf\xe5\xe9\xb9\x852\xd0'
>>> get_random_bytes(10)
b'\xa3\xe3ph\xb4\xf2\x14)\xbcF'
>>> get_random_bytes(10)
b"a\xaa\x14\xe9\xe05\xf2'T\x1b"