These notes and Unit 6 sections 2 and 3.

## Decision Problems

As we start to look more closely at P and NP, we'll restrict our attention to decision problems, which are problems having yes/no answers. Why that is desireable we'll see later, but intuitively it should be clear that this simplifies things - after all answers are always a single bit this way!

It seems like restricting ourselves to decision problems would be a very big limitation, but in fact that's not really true. Many interesting problems have a decision problem variant. For example, the 0/1 Knapsack problem can be changed from "Which weights can I choose to maximize the weight in the Knapsack without exceeding the capacity?" to "Is it possible to fill the Knapsack to within k pound or less of its capacity (not exceeding the capacity, of course)?" The rephrased problem is a decision problem, and it's clear that the original is at least as hard as the decision problem. Here's another example, the problem "What's the longest path in a given graph G?" can be phrased as "Is there a path of length at least k in G?". Once again it's clear that the decision problem version is easier. After all, if we knew how to find the longest path, we'd know how to answer whether or not there was a path of length k!

## Certificates

Now, with a decision problem the idea of "guess and check" and verifying solutions is a bit unsatisfactory. After all, a "solution" to "Is it possible to fill the Knapsack to within k pound or less of its capacity?" is either yes or no. Such a solution is easy to guess, but checking it is no easier than just plain solving the problem! On the other hand, if you really did ask "Is there a path of length at least k in G?" and I answered "Yes", you'd find it a bit smart alecky (It's like when you say "Do you know the time?" and the guy you asked it of says "Yes I do".). You'd probably say "Well then show me a path of length at least k!" The actual path acts as a "certificate" that you can use to easily verify that the answer to "Is there a path of length at least k in G?" is indeed yes. If you were dealing with the 0/1 Knapsack Problem, a certificate would be a subset of the given Knapsack weights that actually adds up to some number less than or equal to the capacity, but not less by more than k. The key is that we can verify a certificate for these problems very easily.
Example: Consider the problem of finding a minimum hitting set. If S is a set of n subsets of {1,..,k}, a minimum hitting set for S would be a minimum size subset of {1,...,k} that has non-empty intersection with (i.e. "hits") each of the n elements of S.
1. Define a decision-problem variant of this problem. We'll call the decision-problem variant MINHITSET.
2. Define a notion of "certificate" for MINHITSET.

## A formal definition of NP

Formally speaking, a decision problem Prob is in NP if there is an algorithm that takes an instance I of problem Prob (we'll say the bit-size of I is n) and a purported certificate C for I and verifies C in time O(nr) for some constant r (i.e. in polynomial time). Notice that this definition requires that the bit-size of the certificate is polynomial in n.
Definition: A problem is in NP if it is a decision problem for which we have
1. certificates whose length are polynomial in the size of the problem instance, and
2. an algorithm that verifies certificates in time polynomial in the size of the problem instance.

To prove that a particular problem is in NP, we'll have to show that it satisfies the above definition - kind of like a lawyer showing that the defendant's actions satisfy the definition of the crime for which he's on trial.

## Example: Proving that "longest path" is in NP, PROOF OUTLINE

Here's a "proof" that "longest path" is in NP which leaves out the crucial, but somewhat technical, details concerning input size and running times, etc. Recall that the problem LongPath(G,k) is the question of whether graph G has a path of length at least k. Note that "path" means no repeated vertices!
Proposition: The problem LongPath(G,k) is in NP.
Proof: (PARTIAL!)
1. Note that LongPath(G,k) is a decision problem, as the definition of NP requires!

2. Here's my notion of certificate: A certificate is a list of vertices comprising a path of length at least k

3. Here's my algorithm for verifying a certificate:
Verify(G,k,C)
2. Read certificate C into an array
3. if m < k, where m is the length of C, return FALSE
4. for i = 1 to m - 1 do
if G has no edge from vertex C[i-1] to C[i] return FALSE
5. for i = 0 to m - 1 do
for j = i + 1 to m - 1 do
if C[i] == C[j] return FALSE
6. return TRUE

Now, to make this into a technically correct proof we need to address two technical details: (1) We must schow that any valid certificate has length that is polynomial in the length of the problem instance, i.e. in the length of G,k. (2) We must show that Verify(G,k,C) runs in time polynomial in the length of the problem instance.

## Example: Proving that "longest path" is in NP (PAINFULLY COMPLETE)

Here's an example of how to prove that a decision problem is in NP according to our definition. The problem we'll consider is LongPath(G,k), which asks whether or not there is a path of length at least k in the graph G in which there are no repeated vertices.
Proposition: The problem LongPath(G,k) is in NP.
Proof: The problem is a decison problem, of course. We give the vertices the indices 0,1,...,n-1, where n is the number of vertices in the graph, and we will refer to vertices by index. We will encode a problem instance by writing n-1 0's followed by a 1, which encodes n, then k-1 0's followed by a 1, which encodes k, and then listing out the n^2 bits of the adjacency matrix for G one row at a time from top to bottom. Since k ≤ n, the bit size s of an encoded problem instance is thus:
s = Θ(n^2).
Now we are ready to show that LongPath(G,k) satisfies the two parts of our definition of NP:
1. Certificates. A certificate for this problem is simply a list of at most n vertices. If these vertices form a path in G without repeated vertices and of length is at least k, the certificate is valid and it is invalid otherwise.

The certificate could be encoded by writing out the binary representations of each consecutive vertex in the path using ceiling(lg n) bits. Since the number of vertices in the path is at most n, the bit-length of any certificate is at most n*ceiling(lg n), which is O(n^2). So the certificate length is O(s), i.e. polynomial in the input size.

2. Polynomial time algorithm for verifying certificates.
1. Read problem instance, store G in an adjacency matrix, and store value k.
2. Read the r indices in the Certificate into array P, if r < k return false.
3. Let M be an array of n bits, initialized to 0.
4. for i from 0 .. r-1 do
• if M[P[i]] != 0 return false (we have a repeated vertex)
• set M[P[i]] = 1.
• if i < r-1 and there is no edge from vertex P[i] to vertex P[i+1] return false.
5. return true

It should be clear that each step can be done in time O(s), so the whole algorithm takes time O(s), which is indeed polynomial in s, the bit length of the encoded input, as required.

Hopefully you see that proving that a problem is in NP means nothing more than matching that problem to the above definition, just as we did for LongPath.