SI342: Python - list, tuple, set, dictionary

Collections of data

structs and classes don't count in this discussion because you cannot "iterate" over the fields of a struct or class. you have to go through each by name.
[Java does have a standard library for iterating over the fields of a class, but it's pretty arcane and not a standard part of every day programming.

C has only one "builtin" way of collecting multiple pieces that you can iterate over: arrays. Arrays are, of course, builtin types for C++ and for Java as well. Arrays are the most fundamental structure for storing information in computers. In fact, essentially there are two features of digital computers that account for all their power: iteration via loops, and random-access storage and retrieval of information via arrays ... and the fact that those loops are executed insanely fast, and the memory space available for those arrays are insanely large.

The key features of an array are:

their size is fixed from the time they are created, and
the number of bytes required to store an array element is the same for every element in the array. This second point is why the builtin array types of C/C++/Java are homogeneous, meaning that each array element has the same type.

All these languages have special facilities for strings, i.e. homogeneous, sequential collections of characters, and there are standard libraries providing functions, structs and classes for other kinds of collections. But as far as "builtin" features are concerned, it's pretty much just arrays.

The point: In this class we will give a brief overview of Python's builtin facilities for collections of data.

The go-to collection: Python "lists"

Python aims to make it easy for programmers to express what they want rather than focusing on being efficient. In practice, things that need to be efficient are written in C/C++ and connected to Python via "wrapper functions". Functions you call in your Python programs send the computationally intensive work off to the C/C++ code behind the scenes. In any event, the fact that arrays need a fixed length determined when they are created and the fact that they must be homogeneous is at odds with the goal of making things easy for programmers. Therefore, Python does not have true builtin arrays.

Wow ... no arrays? So what do we do?

Getting around the fixed-size constraint
In general, we can get around the size constraints of arrays by providing some kind of add-to-back and remove-from-back operations and accepting that they may sometimes have to resize by creating a new array with more or less space and copying information from the old array into the new one. Important! This kind of data structure is generically called an extensible array, and providing the add-to-back / remove-from-back functionality efficiently is a big topic in your data structures class! So pay attention!

Getting around the homogeneous-elements constraint
We can get around this in two ways: we can allow different sized objects to be stored in the memory space allocated for the array, but then we lose "random access", i.e. efficient access by index. That's something we don't usually want to lose, so this is basically never done. The other way to allow different sized objects is not to store the objects in the array but, instead, store pointers or references to objects. This can be a serious performance hit when you are talking about arrays of ints or doubles or chars, which are really common cases. But that's what we have to do.

If you do these two things you get ... Java's ArrayList! And, as you know, the ArrayList makes your programming life a whole lot easier. In Python, there are no true arrays, all you get is essentially ArrayList<Object>, only in Python we call this a "list".

The skinny on Python lists (which you use like arrays)

literals: A list literal is a comma-separated sequence of values in [ ]s, e.g.

A = [56,32,9,10]   # this has length 4
B = ['foo','bar',] # the specious comma at the end is OK in python!	      
C = [ ]            # this is an empty list
D = list()         # this is also an empty list

length of a list: The function len(·) returns the length of a list ... and of most anything in Python for which "length" is a meaningful concept.
```
>>> len([4,8,15,16,23,42])
6
>>> len([])
0	    
```

subscripting: Python lists are subscriptable with [ ]s just like you expect.

>>> A = [56,32,9,10]
>>> A[0]
56
>>> A[3]
10
>>> A[99]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> ['foo','bar','rat'][1]
'bar'

Extensible array operations: append (aka add-to-back), pop (aka remove-from-back)

>>> A = [4,5,6]
>>> A.append(7) # this is "add-to-back"
>>> A
[4, 5, 6, 7]
>>> A.pop() # this is "remove-from-back", note that it returns the value it pops off
7
>>> A
[4, 5, 6]

Heterogeneous arrays in action!

>>> args = [ 10.2, "+", 3.5, "-", 5.0, "=" ] # Look, floats and strings living together in the same array!
>>> def eval(A):
...     res = 0.0
...     op = '+'
...     for i in range(0,len(A),2): # gives 0,2,4,6,8,....,len(A)-1
...         res += (A[i] if op == '+' else -A[i])
...         op = A[i+1]
...     return res
... 
>>> eval(args)
8.7
>>> eval([9,"+",2,"+",19,"-",7,"-",4,"+",3,"="])
22.0

ACTIVITY
Write a function writish(·) that takes a python list A containing strings and non-negative integers and creates a new list B that is a copy of A except that wherever string $w$ and integer $n$ appear consecutively in A, we get $n$ copies of $w$ appearing consecutively in B i.e.: \[ \ldots,w,n,\ldots \text{ in $A$ becomes } \ldots,\underbrace{w,w,\ldots,w}_{\text{$n$ copies}},\ldots \text{ in $B$} \] So, for example:

>>> writish(["a",3,"b"])
['a', 'a', 'a', 'b']
>>> writish(["a",0,"b"])
['b']
>>> writish(["a","b",3,"c","d",4,"e"])
['a', 'b', 'b', 'b', 'c', 'd', 'd', 'd', 'd', 'e']
>>> writish([" ",5,"foo","bar","ha",7," ","funny"])
[' ', ' ', ' ', ' ', ' ', 'foo', 'bar', 'ha', 'ha', 'ha', 'ha', 'ha', 'ha', 'ha', ' ', 'funny']

Hint 1: This is easier if you use append and pop!
Hint 2: isinstance(x,int) returns True if the value of x is an int and false otherwise.

Python tuples

Python lists are not immutable. You can add and remove elements, and you can change values at existing indices. Python also has tuples (use ( )s instead of [ ]s), which are immutable versions of lists: you can't add or remove and you can't change values at existing indices. Why would you want them? We will see later. Just showing you for completeness.

>>> A = (4.5,"foo")
>>> A[0]
4.5
>>> A[1]
'foo'
>>> A[0] = 5.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> A.pop()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'pop'

Python sets

Python sets are like lists, except that you cannot access elements by index because ... there is no concept of "order" to an element of a set nor, of course, are duplicates allowed. (Note: set literals use { }s instead of [ ]s, e.g. {42,'foo',12.6}). Oh, append and pop don't exist for sets either, because there is no concept of "the back". So what can you do with a set if you don't have indices?
Important! The expression { } is *not* the literal for an empty set (it is actually an empty, "map", see below). Use set() to create an empty set.

Well, you can use the in operator to check whether a value is an element of a set ... but, actually, the in operator exists for lists too.
```
>>> S = {3,2,8}
>>> 3 in S
True
>>> 4 in S
False
>>> 3 in [2,7,3,4,3,7]
True 
```

You can iterate over the elements of a set in a for loop, like this:

>>> S = {42,-5,8}
>>> for x in S: # this "in" is different than the "in" operator from "4 in S"
...     print("x = " + str(x))
x = 8
x = 42
x = -5

Although, now that I think of it, I can do the same thing with lists:

>>> A = [42,-5,8]
>>> for x in A:
...     print("x = " + str(x))
x = 42
x = -5
x = 8

You can put a new item into set S with S.add(·) and remove them with S.add():
```
>>> S = { 3, "foo", 2.5 }
>>> S
{2.5, 3, 'foo'}
>>> S.add(42)
>>> S
{2.5, 3, 'foo', 42}
>>> S.remove('foo')
>>> S
{2.5, 3, 42}	  
```
Though now that I think about it, you have append and pop for lists, so that's not really that special.

Oh, and Danger! you can only store immutable values in sets, which is not a limitation of lists.

>>> foo = [3,4,5]
>>> S = { 42, foo, "hello" }
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> A = [ 42, foo, "hello" ]
>>> A
[42, [3, 4, 5], 'hello']

Important! So why the heck would anyone want to use sets! Three reasons: (1) no duplicates, (2) testing elements with the "in" operator is *much* more efficient than for lists, and (3) handy set operations like S.union(·), S.intersection(·), S.difference(·).

In your data structures class you will look at a few data structures that provide membership test functionality (i.e. "in") efficiently. One of these is a "hashtable", and that's how sets are implemented "under the hood" in Python.

ACTIVITY
If $N \gt 1$ is a positive integer, we can define an interesting sequence $X_N$ by \[ \begin{array}{l} x_0 = 1\\ x_{i+1} = (x_i^2 + x_i + 1) \bmod N \end{array} \] ... but at some point the sequence is bound to hit a repeat value. We define $\text{modsetsize}(N)$ to be the number of steps until you hit your first repetition (or, equivalently, the number of elements in the set of values in the sequence).

Example 1: for $N = 11$ sequence $X_{11} = 1,3,2,7,2,7,2,7,2,7,\ldots$. So modsetsize(11) = 4.

Example 2: for $N = 35$ sequence $X_{35} = 1,3,13,8,3,13,8,3,13,8,\ldots$. So modsetsize(35) = 4.

Define a function modsetsize(N) that calculates this value. Important! Use sets. You'll regret it if you don't. Also, use the in operator. You'll regret it if you don't.

>>> modsetsize(31)
8
>>> modsetsize(32)
16
>>> modsetsize(33)
4
>>> modsetsize(19873791)
292
>>> modsetsize(19873792)
335876

Python dictionaries (also called "maps")

The last of our basic builtin structures for collections of data is the python "dictionary". Generally speaking, it is a key/value store. The "value" is the data you want to store, and the "key" is an identifier you use to refer to that value. Bancroft Hall is a key/value store where "values" are mids and the "keys" are alpha codes. You will spend a whole lot of time in your data structures class learning how to implement efficient key/value stores (which we often call "map" in CS, or "dictionaries" when, as is often the case in Python, the key is a string.

Important! We will focus right now on dictionaries where the keys are strings.

Dictionary literals: Dictionary literals are wrapped in { }s like a set, but its elements are key:value pairs. For example:
```
>>> D = { "foo":42, "bar":19 }
>>> D
{'foo': 42, 'bar': 19}
```

Fetching and assigning values to keys: In practice, we think of a Dictionary as an array that is subscripted by strings rather than by integers.

>>> D = { "foo":42, "bar":19 }
>>> D["foo"]
42
>>> D["bar"] = 99
>>> D
{'foo': 42, 'bar': 99}
>>> D["cat"] = "purrrrrr"
>>> D
{'foo': 42, 'bar': 99, 'cat': 'purrrrrr'}

Checking whether a key exists: When you use the "in" operator like k in D, where D is a dictionary, you are checking whether k is used as a key in D.
```
>>> D = { "foo":42, "bar":19 }
>>> "foo" in D
True
>>> "rat" in D
False
```
Iterating over a dictionary: If you try to iterate over a dictionary the way you do with lists, tuples or sets, you'll find that what you and up doing is iterating over the set of keys for the dictionary.
```
>>> D = { "foo":42, "bar":19 }
>>> for X in D:
...     print(X)
foo
bar
```

Important! It is impossible to overstate how universal the use of dictionaries are in Python. They pop up everywhere!

ACTIVITY

FROM IC210/SI204 LAB

Part 1: Converting Currencies
Write a program (note: name the source code file part1.cpp) that converts between Dollars, Euros and Pounds. The program reads input from the user in the following format: Convert amount currency_1 to currency_2 and prints results in the obvious way. Here are a couple of sample runs:

~$ ./part1
Convert 3.50 Euros to Dollars

~$ ./part1
Convert 3.50 Euros to Pounds

Here are the conversion rates you'll need: 1.00 Dollar is Euros and 1.00 Dollar is Pounds.

Part 2: More Currencies
Extend your program from Part 1 (note: name the source code file part2.cpp) to allow for Canadian dollars as well (1.00 Dollars US is Dollars Canadian). Now the user can't simply put "Dollars" in the input, it must be either "Dollars US" or "Dollars Canadian". Here are a couple of sample runs:

~$ ./part2
Convert 3.50 Euros to Dollars Canadian

~$ ./part2
Convert 6.72 Dollars US to Dollars Canadian

To the right is part of a lab from IC210/SI204. You probably solved this problem with massive if-then-else statements. Let's see if we can find a better data-driven solution to this in Python.

Today, your job is to define a function convert(amt,curFrom,CurTo) that takes the amount (as a float) and the from currency and to currency (as strings) and returns the converted amount based on the currency names and conversion rates from the IC210/SI204 assignment. Here are some sample runs

>>> convert(26.75,'Euros','Dollars US')
29.662896429363492
>>> convert(300.50,'Dollars Canadian','Pounds')
173.91357555487988
>>> convert(25.75,'Pounds','Pounds')
25.750000000000004
>>> convert(84.50,'Dollars US','Euros')
76.2021

Hint! I strongly suggest you make the following definition, which should make this function trivial to write:

cfactors = { 'Dollars US':1.0, 'Dollars Canadian':1.3156, 'Euros':0.9018, 'Pounds':0.7614 }

Question: How hard would it be to augment your Python function by adding three more currencies ... like 'Krone', 'Yen' and 'Dollars Australian' along with the appropriate conversion rates? Think back to your IC210/SI204-style solution with if-else's ... how hard would it be to make the same additions to that code?