SI242: SMT-Solver activity

The Problem ...

There is a classic problem in computer science called Subset Sum. The idea is simple: You have $n$ objects, each of which has a weight, and some target total weight $W$. Your job is to choose a subset of these objects whose weights sum to $W$, or determine that it is not possible to do so.

Example: $n = 3$ objects with weights $w_0=9,w_1=6,w_2=11$ and target $W = 20$. Solution: choose objects $o_0$ and $o_2$, since $9 + 11 = 20$.

When there are a lot of objects this can become a difficult problem. In your Computer Algorithms class you will learn just how difficult. But for now, is suffices to observe that with $n$ objects, there are $2^n$ different ways to choose our subset. So with even just 30 objects, there are over a billion ways to choose. So going through all possible solutions one at a time is not a good strategy!

Another difficult, but equally classic, problem is called 0/1 Knapsack. Here each object has a value as well as a weight, and your job is to choose a subset of the objects for which the total weight is at most some threshold $W$ and the total value is at least some threshold $V$, or determine that it is not possible to do so.

Example: $n = 3$ objects with weights $w_0=9,w_1=6,w_2=11$, values $v_0=4, v_1=8, v_2=5$ and thresholds $W = 20$, $V=10$. Solution: choose objects $o_1$ and $o_2$, since $6 + 11 \leq 20$ and $8+5 \geq 10$. (Note: $\{o_0,o_2\}$ doesn't work because the total value is only $9$.)

The Tool ...

Satisfiability Modulo Theory (SMT) solvers are tools that take first-order formulas as input, and return UNSAT if there is no model for the formula, and SAT+model if there is a model. However, instead of inputing arbitrary formulas and asking the tool to find a complete satisfying interpretation from scratch, we choose a predefined theory — which means that the domain of discourse and some predicates, constants and functions are there already with an agreed-upon interpretation. We will be using a theory of the integers, but restricted to only addition and subtraction, so no multiplication or division. (Technically the theory is called UFLIA.) More precisely, the theory starts with the following interpretation:

domain of discourse: integers
predicates: lt(·,·) which is "less than".
constants: the integers in their usual notation, e.g. 0, 42, -12, etc.
functions: plus(·,·) for addition, and minus(·,·) for subtraction

Of course you have equality (=) and disequality (!=) since we are always talking about first order logic with equality, and all the propositional operators. Any new constants you want to use (like w7 or v2) are OK, as are any propositional variables you want (so a => w7=0, for example, is fine). You can also make up new functions and predicates and use quantifiers ("A:" for ∀ and "E:" for ∃), but you will not need that for this activity.

Here is how we can turn the Subset Sum problem example from above into a first order formula for an SMT-solver: for object $i$ we will have a propositional variable $a_i$ that stands for the proposition "choose object $i$", and a constant $w_i$ that stands for object $i$'s contribution to the total weight (which would be zero if the object isn't chosen).

(a0 => w0 = 9) & (~a0 => w0 = 0) & (a1 => w1 = 6) & (~a1 => w1 = 0) & (a2 => w2 = 11) & (~a2 => w2 = 0) & sum = plus(plus(w0,w1),w2) & sum = 20
\______________________________/   \______________________________/   \_______________________________/   \________________________/   \______/
 object 0 contributes 9 or zero     object 1 contributes 6 or zero     object 2 contributes 11 or zero    add up weight contributions  total is 20
 depending on choice a0             depending on choice a1             depending on choice a2

The domain, constants like 9,0,6,11 and the function plus already have an interpretation in our theory. The SMT-solver has to finish off the interpretation by finding values for a0, a1, a2, w0, w1, w2, and sum.

Make sure you understand that the only way an interpretation can satisfy the constraint

(a0 => w0 = 9) & (~a0 => w0 = 0)

is if either a0=true and w0=9, or a0=false and w0=0. There is no other way.

I have an SI242 SMT-solver Tool for you that provides an interface to the Z3 SMT solver (limited to the theory UFLIA).

SI242 SMT-solver Tool

If you copy and paste the above formula into the tool and click on "Run Z3!", you will get the following model

Propositional constants: a0,~a1,a2
Theory constants: sum=20,w0=9,w1=0,w2=11

... which tells us that we choose object 0, do not choose object 1, and choose object 2! So when we have a problem that mixes logical reasoning and integer reasoning, SMT solvers give us a very powerful tool!

Note: I think it is much more productive to write the formula on several lines, like ...

(a0 => w0 = 9) & (~a0 => w0 = 0) &   # object 0 stuff
(a1 => w1 = 6) & (~a1 => w1 = 0) &   # object 1 stuff
(a2 => w2 = 11) & (~a2 => w2 = 0) &  # object 2 stuff
sum = plus(plus(w0,w1),w2) &         # add it all up
sum = 20

The Details ... part 1

Write a C++ program (name it p1.cpp) that reads in a Subset Sum problem in the format shown below, and outputs an equivalent first order logic formula to input to the SI242 SMT-solver tool.
Note: In the examples below you see me running program p1 like this: ./p1 < ina1.txt . If you haven't seen that before, don't panic! It is exactly the same as running ./p1 and copy&pasting the contents of file ina1.txt into the terminal.

ina1.txt

ina2.txt

ina3.txt

$ cat ina1.txt 
n = 3
9 6 11
W = 20
$ ./p1 < ina1.txt 
(a0 => w0 = 9) & (~a0 => w0 = 0) & 
(a1 => w1 = 6) & (~a1 => w1 = 0) & 
(a2 => w2 = 11) & (~a2 => w2 = 0) & 
sum = plus(plus(w0,w1),w2) &
sum = 20

$ cat ina2.txt
n = 8
69 84 80 81 90 99 57 29
W = 200
$ ./p1 < ina2.txt
(a0 => w0 = 69) & (~a0 => w0 = 0) & 
(a1 => w1 = 84) & (~a1 => w1 = 0) & 
(a2 => w2 = 80) & (~a2 => w2 = 0) & 
(a3 => w3 = 81) & (~a3 => w3 = 0) & 
(a4 => w4 = 90) & (~a4 => w4 = 0) & 
(a5 => w5 = 99) & (~a5 => w5 = 0) & 
(a6 => w6 = 57) & (~a6 => w6 = 0) & 
(a7 => w7 = 29) & (~a7 => w7 = 0) & 
sum = plus(plus(plus(plus(plus(plus(plus(w0,w1),w2),w3),w4),w5),w6),w7) & 
sum = 200

$ cat ina3.txt
n = 50
8113 3649 5920 2228 385 3906 2853 9527 9895 4351 9719 1990
2565 7756 5691 6138 7792 9338 4074 7365 9808 141 7137 5197
1528 307 6622 4696 9633 7315 1689 1523 6685 9415 570 918
1505 6169 7922 3268 8983 7371 4805 8376 240 2121 7374 2902
5866 9487
W = 130891
$ ./p1 < ina2.txt

This is on you!

If you copy and paste your outputs into the SI242 SMT solver, you should see satisfying interpretations and be able to read off which weights add together to the target number.

I am gifting you the following function definition:

string var(string pre, int idx) { return pre + to_string(idx); }

This will help you write out variables, so var("w",42) returns the string "w42". You can just copy&paste the above line in between "using namespace std;" and "int main()". For example:

#include <iostream>
using namespace std;

string var(string pre, int idx) { return pre + to_string(idx); }

int main() {
  for(int i = 1; i < 6; i++) {
    cout << "(" << var("x",i-1) << " xor " << var("x",i) << ") & ";
  }
  cout << "x5" << endl;
}

Compile and run this and you get:

(x0 xor x1) & (x1 xor x2) & (x2 xor x3) & (x3 xor x4) & (x4 xor x5) & x5

This is a great chance to practice the iterative refinement technique you learned in IC210. I would start by writing a program to read the whole input, and print the last output line only, e.g. sum = 200. Compile/test/debug this, and only move on to produce more of the final output when it works.
For each weight you read in, there is one line of formula. This means you can write out the formula as you read the weights ... no need to store things in an array if you don't want to.

The only tricky part is writing out the "sum = plus(plus( ...))" line. But it's not as bad as you might think. As an example, suppose n=5.

sum = plus(plus(plus(plus(w0,w1),w2),w3),w4)     
\________________________/\/\______________/
 loop n-1 times printing   |   loop n-1 times priting ",wi)"
 out "plus("             print "w0"

The Details ... part 2

ina2.txt from the previous section has a solution. Copy and paste the formula below to check!

(a0 => w0 = 69) & (~a0 => w0 = 0) & 
(a1 => w1 = 84) & (~a1 => w1 = 0) & 
(a2 => w2 = 80) & (~a2 => w2 = 0) & 
(a3 => w3 = 81) & (~a3 => w3 = 0) & 
(a4 => w4 = 90) & (~a4 => w4 = 0) & 
(a5 => w5 = 99) & (~a5 => w5 = 0) & 
(a6 => w6 = 57) & (~a6 => w6 = 0) & 
(a7 => w7 = 29) & (~a7 => w7 = 0) & 
sum = plus(plus(plus(plus(plus(plus(plus(w0,w1),w2),w3),w4),w5),w6),w7) & 
sum = 200

Answer the below in a plain text file called README. Figure out whether the solution that you got is unique. In other words, is it the only solution or are there others? How can you figure this out?

The Details ... part 3

Imagine we had a program p2.cpp that reads in a 0/1 Knapsack Problem in the format shown below, and outputs an equivalent first order logic formula to input to the SI242 SMT-solver tool. If you look below, you see something that is almost the complete formula for two input examples. However, I have left off the constraints on the weight and value ... you will have to figure that out!

inb1.txt

inb2.txt

$ cat inb1.txt 
n = 3
9 4
6 8
11 5
W = 20
V = 10
$ ./p2 < inb1.txt 
(a0 => w0 = 9 & v0 = 4) & (~a0 => w0 = 0 & v0 = 0) & 
(a1 => w1 = 6 & v1 = 8) & (~a1 => w1 = 0 & v1 = 0) & 
(a2 => w2 = 11 & v2 = 5) & (~a2 => w2 = 0 & v2 = 0) & 
sumw = plus(plus(w0,w1),w2) &
sumv = plus(plus(v0,v1),v2) &
# You write it!
# weight at most 20, value at least 10

$ cat inb2.txt
n = 10
15 52
81 34
61 65
20 86
43 72
78 35
70 51
46 42
95 71
47 55
W = 220
V = 230
$ ./p1 < ina2.txt
(a0 => w0 = 15 & v0 = 52) & (~a0 => w0 = 0 & v0 = 0) & 
(a1 => w1 = 81 & v1 = 34) & (~a1 => w1 = 0 & v1 = 0) & 
(a2 => w2 = 61 & v2 = 65) & (~a2 => w2 = 0 & v2 = 0) & 
(a3 => w3 = 20 & v3 = 86) & (~a3 => w3 = 0 & v3 = 0) & 
(a4 => w4 = 43 & v4 = 72) & (~a4 => w4 = 0 & v4 = 0) & 
(a5 => w5 = 78 & v5 = 35) & (~a5 => w5 = 0 & v5 = 0) & 
(a6 => w6 = 70 & v6 = 51) & (~a6 => w6 = 0 & v6 = 0) & 
(a7 => w7 = 46 & v7 = 42) & (~a7 => w7 = 0 & v7 = 0) & 
(a8 => w8 = 95 & v8 = 71) & (~a8 => w8 = 0 & v8 = 0) & 
(a9 => w9 = 47 & v9 = 55) & (~a9 => w9 = 0 & v9 = 0) & 
sumw = plus(plus(plus(plus(plus(plus(plus(plus(plus(w0,w1),w2),w3),w4),w5),w6),w7),w8),w9) &
sumv = plus(plus(plus(plus(plus(plus(plus(plus(plus(v0,v1),v2),v3),v4),v5),v6),v7),v8),v9) &
# You write it!
# weight at most 220, value at least 230

What are the conditions on sumw and sumv for the inb1 and inb2 formulas that you need to add to complete the formulas?
Add answers to the README file.
For inb2, play around with the tool and tweaking the input in order to discover what the most value is you can get without going over 220 total weight? Which objects do you choose to get that maximum value?
Add answers to the README file.
How did you get that answer? How do you know you found the maximum possible value?
Add answers to the README file.
Going futher: Write program p2.cpp that reads 0/1 Knapsack problems input in the format shown above and produces the complete formula to use with the SI242 SMT solver tool in order to solve the 0/1 Knapsack problem.

The Problem ...

The Tool ...

The Details ... part 1

The Details ... part 2

The Details ... part 3

Submit