Class 26: Knapsack-based cryptography


Reading
None.

Homework
  1. Read these lecture notes thoroughly as they explain more thoroughly how character messages can be encrypted using Knapsack problems.

    Consider the simlified 5-bit "ASCII":

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
    A B C D E F G H I J K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
    
    So, for example, G = 6 = 00110. Suppose you have a knapsack-based public key cryptosystem that encodes 10-bit blocks (i.e. two characters at a time). Your private key is
    [3,4,12,23,46,92,184,369,738,1473]
            N = 3021
    d inverse = 1339
    	
    Decode the message "253, 2576, 2228, 2209".

  2. This second part does not need to be handed in.
    • Let G be an undirected graph with weighted edges. Consider the following greedy algorithm for finding a shortest path from vertex v to vertex w.
      ShortestPath(G,v,w)
      1. if v=w we're done
      2. choose the edge out of v of least weight (call the vertex it leads to "u") and add it to the path
      3. delete v from the graph
      4. set v=u and goto 1
      Prove that this greedy algorithm does not always provide an optimum solution.
    • Suppose G is an unweighted graph. A set of vertices in G is called clique if all possible edges between them are in G. Consider the problem of finding a maximum clique - i.e. the largest set of vertices in G that form a clique. Here's a possible greedy algorithm:
      1. let C = { }
      2. if there are no vertices left in G, return C
      3. choose the vertex of highest degree, call it v
      4. delete from G any vertex that is not connected by an edge to v.
      5. add v to C, delete it from G, and goto 2
      Prove that this greedy algorithm does not always provide an optimum solution.

Quiz
You guys took this really fun quiz today.

Superincreasing 0/1 Knapsack Problem
Suppose that the items A1, ..., An are such that each successive item is heavier than the combined weights of all the items before it. Such a sequence is called "superincreasing".
	    Ex. of a superincreasing sequence: 3,7,12,25,60, ...

            Ex. of a non-superincreasing sequence: 3,7,9,19,27, ...
How does our greedy choice of "heaviest that still fits" work now?

Well now we get an optimal greedy solution. But how can we prove that we always get an optimal solution?

Proposition 1 (The Greedy Choice is Good): Let Ai be the largest element of the superincreasing sequence A1 < A2 < ... < An that is less than or equal to W, the target weight. If a solution to the Knapsack problem exists, Ai is in it.
Proof: Clearly none of A(i+1), A(i+2), ..., An can be added, since otherwise they'd be "the largest less than W". If we leave out Ai, then even if we included everything before it, the total weight would be less than Ai (this is the definition of superincreasing). Since Ai weighs less than or equal to W, the total weight of our proposed solution would be less than W.

Proposition 2: (The Greedy Choice Plus an Optimal Solution to a Subproblem Provide an Optimal Solution to the Original Problem): Let Ai be the largest element of the superincreasing sequence A1 < A2 < ... < An that is less than or equal to W, the target weight. If a solution to this superincreasing Knapsack problem exists, it is Ai plus the solution to the smaller superincreasing Knapsack problem

Items: A1,A2,...,A(i-1)   Target weight: W - weight(Ai)

Proof: Proposition 1 showed that Ai was part of a solution. We also know that none of A(i+1), A(i+2), ..., An are part of a solution. So the other elements in the solution must come from items A1,A2,...,A(i-1). Moreover, their weight plus the weight of Ai must equal W, so the target weight for the subproblem is W - weight(Ai).

Encrypting Messages using Knapsack Problems
These kinds of Knapsack problems form the basis of several different encryption algorithms. Here's how: Suppose that 62,93,81,88,102,37 is my special Knapsack Problem. Suppose you have a message which is 38 = 100110. You encode it as:
62  93  81  88  102  37
-----------------------------   ====> 62 + 88 + 102 = 252
1   0   0   1   1    0  
	    
So You send me W = 152 and everyone knows my Knapsack problem is: A1,...,An = 62,93,81,88,102,37. So I can figure out my original message (38) by solving this 0/1 Knapsack problem. Unfortunately, this is tough to do. That makes it secure, because it's hard for anyone else to decode, but it's not practical if I can't decode your message to me.

This scheme would work nicely if my Knapsack problem was easy for me to solve, but not for anyone else. Well, if it was a superincreasing Knapsack problem, it would be easy for me. For example: What if my Knapsack problem was: 2,3,6,13,27,52. You'd send me:


2   3   6   13  27   52
-----------------------------   ====> 2 + 13 + 27 = 42
1   0   0   1   1    0  
	
I could decode W=42 pretty easily, since my Knapsack is superincreasing, and we have a nice Greedy algorithm for solving that problem. However, everyone in the world can see that my Knapsack is superincreasing, so that doesn't help much.

OK. Before I give the world my Knapsack problem, I'm going to hide the fact that it's superincreasing. How? I want to "mix up" my problem by taking each number and multiplying it by 31 (which I chose more or less at random), and taking it all mod 105 (which I chose simply because it's a little bigger than the sum of my A1, ..., An). So:

2  ->  2*31 % 105 = 62
3  ->  3*31 % 105 = 93
6  ->  6*31 % 105 = 81
13 -> 13*31 % 105 = 88
27 -> 27*31 % 105 = 102
52 -> 52*31 % 105 = 37
	    
Coincidently ;-) we've already encoded our message using this sequence, and we got 152. But if you send me 152 using my scrambled problem, how can I tell what I'm supposed to decode using my unscrambled (i.e. superincreasing) problem? Well, 31*61 = 1 mod 105, so I can unscramble by multiplying by 61 and taking the result mod 105.
     62     +      88      +      102        =      152
      |            |               |                 |
      v            v               v                 v

62*61 % 105  88*61 % 105    102*61 % 105      152*61 % 105
    
      |            |               |                 |
      v            v               v                 v
    
      2     +      13      +       27        =       42
	    
In other words, I can get the unscrambled (i.e. superincreasing) Knapsack problem simply by unscrambling W. So: 152 -> 152*61 % 105 = 42. Now I only have to solve my superincreasing Knapsack problem to decode, and that's easy. It's hard for anyone else, though, because they don't know how to unscramble to get to a the easy Knapsack problem. They just know the scrambled problem, and to decode they've got to solve that, which is pretty difficult.

Our First Look At Public Key Cryptography
So let's look at what we've got. I make up my very own Superincreasing Knapsack Problem, A1,A2,...,An. I scramble it using Ai -> r*Ai % M = Si. I then broadcast S1,S2,...,Sn out to the world at large. Anyone who wants to send me a secret message ... go to it. Break it up into n-bit sequences, and encrypt as above. I can decrypt, because I can unscramble the problem and solve as a Superincreasing Knapsack Problem. Nobody else can do that, so they have to solve it as a hard Knapsack problem.

I'll choose my sequence with n > 250, with all my wieghts more than 200 bits long, and multiply by an r of that kind of size as well. With this big of a problem, there are 2^250 possible ways to fill the Knapsack. Checking them all would require more time than our Solar System has left to live.

This kind of scheme is called Public Key Cryptography anyone can encrypt a message for me using my scrambled sequence (my public key), but only I can decrypt messages.

Some practice problems
Consider our 0/1 Knapsack-based public key cryptosystem from class. Recall that messages consist only of the letters A..Z, which are given the numbers 0..25 respectively. In other words:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A B C D E F G H I J K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
A message is broken up into two-letter blocks, which are encrypted and sent separately. The two letter block (L1,L2) is transformed into a sequence of 0's and 1's by concatenating the 5-bit binary representation of the number for L1 with the 5-bit binary representation of the number for L2.
Example:  Encode the block "MY"

      M  Y

      |  |
      V  V

     12  24

     /    \
    /      \
  01100  11000
	    
Thus, each block is represented by 10-bits, and with a 10-element sequence of knapsack weights, you can encode the block.
Public key weights: 1647  1189  546  1550  79  158  316  174  348  2070

Encode 01100 11000 as

1647  1189  546  1550  79  158  316  174  348  2070 ===>  2209
 0      1    1     0   0    1   1     0    0    0
	
So the message "01100 11000" is encoded as "2209". The associated private key is:
[3,4,12,23,46,92,184,369,738,1473]
        N = 3021
d inverse = 1339
	    
so I can decode such a message by "unscrambling" 2209 which gives me 2209*1339 mod 3021 = 292. Now I solve the superincreasing knapsack problem for those weights and get 4 + 12 + 92 + 184 = 292 so the message decodes as:
3   4   12  23  46  92  184 369 738 1473
0   1   1   0   0   1   1   0   0   0
\_______________/   \_______________/
 01100 = 12 = M       11000 = 24 = Y
       \                  /
        \                /
         \______________/

                 MY        
Encoding messages of several characters results in a sequence of numbers, each number encoding two characters at a time.


Christopher W Brown
Last modified: Fri Mar 26 10:40:09 EST 2004