SI342: list/set/map comprehension in Python

The Pigeonhole Principle

We are going to do one bit of Discrete Math before moving on to Python. It involves sets and functions, so it will be surprisingly on-topic with our Python work later!

The Pigeonhole Principle: If you have $n$ holes and stick each of $n+1$ pigeons in a hole, there must be a hole that contains more than one pigeon.

Hopefully this makes sense. Hopefully you believe it's true. Hopefully you are not too caught up in the morality of shoving pigeons into holes. But ... this doesn't sound very mathy. So is this really about pigeons, or is there some deeper mathematical (and useful) truth to all this?

Let's think of the pigeons as elements of some set $P$, where $|P|=n+1$. And instead of $n$ holes, let's think of each of the $n$ holes as having labels, which are the elements of some set $H$, where $|H| = n$. And instead of sticking the pigeons into holes, we will simply assign each element of $P$ a label. After all, "assigning pigeon x hole label i" sounds much nicer then "stuffing pigeon x into hole $i$". In any event, hopefully you realize that this mapping of elements of $P$ to elements of $H$ is actually function! Let's call it $h$: $$ h:P \rightarrow H $$ Now, having "two pigeons in the same hole" means having two distinct elements of $P$, let's call them $a$ and $b$, such that $h(a) = h(b)$. That gives us a nice version of the pigeonhole principle with sets and functions rather than pigeons/holes and stuffing.

[Pigeonhole Principle] Let $P$ and $H$ be sets, and $h$ be a function such that $h:P \rightarrow H$. If, for some $n > 0$, $|P| = n+1$ and $|H| = n$, then there are two elements $a$ and $b$ of $P$, such that $a \neq b \wedge h(a) = h(b)$.

Proceed by induction on size $n-1$.

Case $n-1=0$: In this case $n = 1$, so $|P|=2$ and $|H|=1$. Let $a$ and $b$ be the two distinct elements of $P$ and $i$ be the only element of $H$: $h(a) = i$ and $h(b) = i$, since $i$ is the only element of $H$. ✓

Case $n-1>0$: In this case $n > 2$. Choose an element of $H$, any element, and call it $i$. If two or more elements of $P$ get mapped to $i$ by $h$, then the theorem clearly holds. So suppose not, and let $c$ be the element of $P$ that $h$ maps to $i$, if it exists, and a randomly selected element of $P$ otherwise. Let $P'=P-\{c\}$ and $H'=H-\{i\}$. Then $h$ maps the elements of $P'$ to $H'$, $|P'| = |H'| + 1$ and $|H'| > 0$. So, we can apply the inductive hypothesis to deduce that there are $a,b\in P'$ such that $a \neq b \wedge h(a) = h(b)$. Of course $a$ and $b$ are also elements of $P$, so we have shown that the theorem holds. ✓

List Comprehensions

You may remember from IC211 that Java has an interface called Iterable (see the last section of Lecture 15), which is for classes (like ArrayList) that have a method to return an iterator, so you can iterate over the data stored in that class. Python has its own idiom for interfaces, and its own "iterable" interface representing the same idea. In fact, the "for X in Y :" looping construct we've used so much is based on it! The only requirement for the Y part is that it evaluates to an object implementing the iterable interface!

Important! lists [ ], tuples ( ), sets { }, maps { }, strings " ", and range objects (produced by range(·,·)) all implement the iterable interface.

List objects are very commonly used in Python, and being able to create them flexibly and concisely is really useful, especially for data-driven programming. To that end, Python has a special mechanism for building lists called list comprehensions. This mechanism is modeled after the set constructor notation we use so much in theory and discrete math.

Example 1: Suppose we want to express the set of squares of 2-digit numbers that, mod 7, are either 0, 2 or 5. In set constructor notation I might write $\{ i^2 \ \big|\ i\in\{10,11,\ldots,99\} \wedge i \bmod 7 \in \{0,3,5\}\}$. We can generate this as a Python set with: { i**2 for i in range(10,100) if i % 7 in {0,3,5} }.

>>> { i**2 for i in range(10,100) if i % 7 in {0,3,5} }
{6400, 3969, 9216, 4356, 8836, 9604, 3721, 144, 784, ..., 100, 1764, 2916, 361, 2025, 7921, 5625}

Let's set these one above the other to see the similarity:

$\{$	$i^2$	$\ \big\|\ $	$i\in\{10,11,\ldots,99\}$	$\wedge$	$i \bmod 7 \in \{0,3,5\}$	$\}$
`{`	`i**2`	`for`	`i in range(10,100)`	`if`	`i % 7 in {0,3,5}`	`}`

If we do the same thing with [ ]s for a list, then not only do we get a list, but the order of elements corresponds with the order of the elements in range(10,100).

>>> [ i**2 for i in range(10,100) if i % 7 in {0,3,5} ]
[100, 144, 196, 289, 361, 441, 576, 676, 784, 961, 1089, 1225, 1444, 1600, 1764, 2025, ..., 8281, 8836, 9216, 9604]

                 Need to describe the values i takes on
                ________________________________________
               /                                        \
        [ i**2 for i in range(10,100) if i % 7 in {0,3,5} ]
          \__/ \____________________/ \_________________/
           |      defines where i      Optional! boolean
   An expression  values come from     expression defining
   Note: this     Note: any iterable   which values of i
   introduces a   is OK here!          from the iterable
   new variable                        we want to use
   i with scope
   limited to
   the [ ... ]

TODO

Try each of these in the interpreter and make sure you understand why they do what they do!
1. [ x + x for x in [3,1,4,1,5,9] ]
2. [ x + x for x in ["red","fish","blue","fish"] ]
3. [ x + x for x in "beatarmy" ]
4. [ x + x for x in [(1,5),(8,4),(2,2)] ]
5. [ x + x for x in [(1,5),(8,4),(2,2)] if x[0] <= x[1] ]
6. [ x + y for (x,y) in [(1,5),(8,4),(2,2)] if x <= y ]
Make the following list definition:
```
A = ['one','here','comes','the','two','to','the','three','to','the','four','tell',"'em",'bring','another','out,','we','need','plenty','more']
```
Now write a comprehension to create the list of all even-indexed entries of A. Should start with ['one,', 'comes', 'two' ... ].
Hint: instead of "for x in A" you will want something like "for i in range(...". In other words, you will make your comprehension over indices of A rather than over elements of A.

Define the following map of professors to classrooms:

M = { 'brown':108, 'chambers':201, 'crabbe':117, 'gentile':218, 'kenney':318, 'knoll':107 }

Note that "for x in M" gives you the keys of map M, not the values. So:

>>> M = { 'brown':108, 'chambers':201, 'crabbe':117, 'gentile':218, 'kenney':318, 'knoll':107 }
>>> [ x for x in M ]
['brown', 'chambers', 'crabbe', 'gentile', 'kenney', 'knoll']
>>> [ M[x] for x in M ]
[108, 201, 117, 218, 318, 107]

Write a comprehension that gives a list of all the names of professors whose associated classroom is on the first floor.

Continuing from the previous problem, write a comprehension that gives a list of (name,room) pairs for prof/room pairs on the first floor. So the output should be: [('brown', 108), ('crabbe', 117), ('knoll', 107)]

Important! Set comprehensions using { }s instead of [ ]s work exactly the same way, except that there is no concept of order to the elements of a set!

Map comprehensions

Map comprehensions work exactly the same as list comprehensions except that:

they are wrapped in { }s not [ ]s, and
instead of an expression, you get two expressions separated by a ":" - a keyexp and a valexp. I.e. keyexp:valexp

Take a good look at this example and make sure you understand how it works!

>>> mids = {('Olivia', 'Williams', 266496), ('Elijah', 'Taylor', 268596), ('Ava', 'Davis', 267441), ('James', 'Miller', 261273), ('Benjamin', 'Wilson', 261019), ('Sophia', 'Anderson', 261576), ('Liam', 'Johnson', 265362), ('Emma', 'Smith', 265220), ('Noah', 'Brown', 260367), ('Mia', 'Moore', 268987)}
>>> alpha2mid = { X[2]:f'2/C {X[1]}, {X[0]}' for X in mids }
>>> alpha2mid[267441]
'2/C Davis, Ava'
>>> alpha2mid[265362]
'2/C Johnson, Liam'

TODO

Define the string vowels = "aeiou". Use string vowels and map comprehensions to define a map named vcount in which each vowel is a key, and the associated value to each key is zero. If you do it right, vcount should be {'a': 0, 'e': 0, 'i': 0, 'o': 0, 'u': 0}. This kind of thing is super useful because then you could count the number of vowels in a string like this:
```
>>> for x in s:
        if x in vowels:
            vcount[x] += 1    
>>> vcount
{'a': 5, 'e': 4, 'i': 6, 'o': 4, 'u': 1}
```
Suppose that A is a python list containing strings, each one unique in A. For example,
```
A = [ 'hoover', 'aswan', 'beaver', 'fontana', 'three gorges', 'douglass', 'mangla' ]
```
... which are all names of dams. We can think of A as being like a function that maps an index to a name. We often want to do this. But, in many cases we want the reverse as well: we want to map names to indices! In this case, since a string is the key, we need a map. Suppose M is such a map. Then A[3] = 'fontana' and M['fontana'] = 3.
TODO Write a Python map comprehension to create the M map from the A list. If you do it right then, for example, M['hoover'] should be 0, and M['douglass'] should be 5.

Some more problems

For our last problem, we are going to do a modified version of a program you wrote for your SI242 activity on "SMT-Solvers". Ultimately, you will get a list wts of integer "weights" and integer "capacity" W and you will print out input for a subset sum problem in the syntax of our SMT-solver. You don't need to remember any of that, because this is just the problem of printing out stuff in the right format.

Ex 1

Ex 2

$ cat ina1.txt 
wts = [9,6,11]
W = 20
Ultimate output: 
(a0 => w0 = 9) & (~a0 => w0 = 0) & 
(a1 => w1 = 6) & (~a1 => w1 = 0) & 
(a2 => w2 = 11) & (~a2 => w2 = 0) & 
sum = plus(plus(w0,w1),w2) &
sum = 20

$ cat ina2.txt
wts = [69,84,80,81,90,99,57,29]
W = 200
Ultimate output:
(a0 => w0 = 69) & (~a0 => w0 = 0) & 
(a1 => w1 = 84) & (~a1 => w1 = 0) & 
(a2 => w2 = 80) & (~a2 => w2 = 0) & 
(a3 => w3 = 81) & (~a3 => w3 = 0) & 
(a4 => w4 = 90) & (~a4 => w4 = 0) & 
(a5 => w5 = 99) & (~a5 => w5 = 0) & 
(a6 => w6 = 57) & (~a6 => w6 = 0) & 
(a7 => w7 = 29) & (~a7 => w7 = 0) & 
sum = plus(plus(plus(plus(plus(plus(plus(w0,w1),w2),w3),w4),w5),w6),w7) & 
sum = 200

Write function f1(wts,W) that returns a list of strings of the form (~ai => wi = 0) for i from 0 to n-1, where n is the number of weights.
Note1: This should be done with a single list comprehension, no loops!
Note2: You'll want to use a format string, i.e. f'...', in your list comprehension!
```
>>> f1([9,6,11],20)
['(~a0 => w0 = 0)', '(~a1 => w1 = 0)', '(~a2 => w2 = 0)']
>>> f1([69,84,80,81,90,99,57,29],200)
['(~a0 => w0 = 0)', '(~a1 => w1 = 0)', '(~a2 => w2 = 0)', '(~a3 => w3 = 0)', '(~a4 => w4 = 0)', '(~a5 => w5 = 0)', '(~a6 => w6 = 0)', '(~a7 => w7 = 0)']
```

Write function f2(wts,W) that returns a list of strings of the form (ai => wi = wts[i]) for i from 0 to n-1, where n is the number of weights. This should be done with a single list comprehension, no loops!

>>> f2([9,6,11],20)
['(a0 => w0 = 9)', '(a1 => w1 = 6)', '(a2 => w2 = 11)']
>>> f2([69,84,80,81,90,99,57,29],200)
['(a0 => w0 = 69)', '(a1 => w1 = 84)', '(a2 => w2 = 80)', '(a3 => w3 = 81)', '(a4 => w4 = 90)', '(a5 => w5 = 99)', '(a6 => w6 = 57)', '(a7 => w7 = 29)']

Remember that if you have a list L of strings, you can use "".join(L) to concatenate them all together, or ":".join(L) to concatenate them all together :-separated, and so forth. Also remember that a string times an int gives repeated copies of the strings so, for example, "yo"*3 is yoyoyo. Now write a function f3(wts,W) that returns the line "plus(plus(plus(..." line as a string.
Note: This time a list comprehension helps, but you'll need more ... like join, etc.
```
>>> f3([9,6,11],20)
'sum = plus(plus(w0,w1),w2)'
>>> f3([69,84,80,81,90,99,57,29],200)
'sum = plus(plus(plus(plus(plus(plus(plus(w0,w1),w2),w3),w4),w5),w6),w7)'
```

Put it all together! Write function f4(wts,W) that returns the whole thing as a string. If you are clever and take to heart what you've done so far and the power of "join" ... it'll be a snap!

>>> f4([9,6,11],20)
'(~a0 => w0 = 0) & (~a1 => w1 = 0) & (~a2 => w2 = 0) & (a0 => w0 = 9) & (a1 => w1 = 6) & (a2 => w2 = 11) & sum = plus(plus(w0,w1),w2) & sum = 20'
>>> f4([69,84,80,81,90,99,57,29],200)
'(~a0 => w0 = 0) & (~a1 => w1 = 0) & (~a2 => w2 = 0) & (~a3 => w3 = 0) & (~a4 => w4 = 0) & (~a5 => w5 = 0) & (~a6 => w6 = 0) & (~a7 => w7 = 0) & (a0 => w0 = 69) & (a1 => w1 = 84) & (a2 => w2 = 80) & (a3 => w3 = 81) & (a4 => w4 = 90) & (a5 => w5 = 99) & (a6 => w6 = 57) & (a7 => w7 = 29) & sum = plus(plus(plus(plus(plus(plus(plus(w0,w1),w2),w3),w4),w5),w6),w7) & sum = 200'