Collections of data

structs and classes don't count in this discussion because you cannot "iterate" over the fields of a struct or class. you have to go through each by name.
[Java does have a standard library for iterating over the fields of a class, but it's pretty arcane and not a standard part of every day programming.
C has only one "builtin" way of collecting multiple pieces that you can iterate over: arrays. Arrays are, of course, builtin types for C++ and for Java as well. Arrays are the most fundamental structure for storing information in computers. In fact, essentially there are two features of digital computers that account for all their power: iteration via loops, and random-access storage and retrieval of information via arrays ... and the fact that those loops are executed insanely fast, and the memory space available for those arrays are insanely large.

The key features of an array are:

  1. their size is fixed from the time they are created, and
  2. the number of bytes required to store an array element is the same for every element in the array. This second point is why the builtin array types of C/C++/Java are homogeneous, meaning that each array element has the same type.

All these languages have special facilities for strings, i.e. homogeneous, sequential collections of characters, and there are standard libraries providing functions, structs and classes for other kinds of collections. But as far as "builtin" features are concerned, it's pretty much just arrays.

The point: In this class we will give a brief overview of Python's builtin facilities for collections of data.

The go-to collection: Python "lists"

Python aims to make it easy for programmers to express what they want rather than focusing on being efficient. In practice, things that need to be efficient are written in C/C++ and connected to Python via "wrapper functions". Functions you call in your Python programs send the computationally intensive work off to the C/C++ code behind the scenes. In any event, the fact that arrays need a fixed length determined when they are created and the fact that they must be homogeneous is at odds with the goal of making things easy for programmers. Therefore, Python does not have true builtin arrays.

Wow ... no arrays? So what do we do?

Getting around the fixed-size constraint
In general, we can get around the size constraints of arrays by providing some kind of add-to-back and remove-from-back operations and accepting that they may sometimes have to resize by creating a new array with more or less space and copying information from the old array into the new one. Important! This kind of data structure is generically called an extensible array, and providing the add-to-back / remove-from-back functionality efficiently is a big topic in your data structures class! So pay attention!

Getting around the homogeneous-elements constraint
We can get around this in two ways: we can allow different sized objects to be stored in the memory space allocated for the array, but then we lose "random access", i.e. efficient access by index. That's something we don't usually want to lose, so this is basically never done. The other way to allow different sized objects is not to store the objects in the array but, instead, store pointers or references to objects. This can be a serious performance hit when you are talking about arrays of ints or doubles or chars, which are really common cases. But that's what we have to do.

If you do these two things you get ... Java's ArrayList! And, as you know, the ArrayList makes your programming life a whole lot easier. In Python, there are no true arrays, all you get is essentially ArrayList<Object>, only in Python we call this a "list".

The skinny on Python lists (which you use like arrays)

ACTIVITY
Write a function writish(·) that takes a python list A containing strings and non-negative integers and creates a new list B that is a copy of A except that wherever string $w$ and integer $n$ appear consecutively in A, we get $n$ copies of $w$ appearing consecutively in B i.e.: \[ \ldots,w,n,\ldots \text{ in $A$ becomes } \ldots,\underbrace{w,w,\ldots,w}_{\text{$n$ copies}},\ldots \text{ in $B$} \] So, for example:
>>> writish(["a",3,"b"])
['a', 'a', 'a', 'b']
>>> writish(["a",0,"b"])
['b']
>>> writish(["a","b",3,"c","d",4,"e"])
['a', 'b', 'b', 'b', 'c', 'd', 'd', 'd', 'd', 'e']
>>> writish([" ",5,"foo","bar","ha",7," ","funny"])
[' ', ' ', ' ', ' ', ' ', 'foo', 'bar', 'ha', 'ha', 'ha', 'ha', 'ha', 'ha', 'ha', ' ', 'funny']
      
Hint 1: This is easier if you use append and pop!
Hint 2: isinstance(x,int) returns True if the value of x is an int and false otherwise.

Python tuples

Python lists are not immutable. You can add and remove elements, and you can change values at existing indices. Python also has tuples (use ( )s instead of [ ]s), which are immutable versions of lists: you can't add or remove and you can't change values at existing indices. Why would you want them? We will see later. Just showing you for completeness.
>>> A = (4.5,"foo")
>>> A[0]
4.5
>>> A[1]
'foo'
>>> A[0] = 5.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> A.pop()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'pop'

Python sets

Python sets are like lists, except that you cannot access elements by index because ... there is no concept of "order" to an element of a set nor, of course, are duplicates allowed. (Note: set literals use { }s instead of [ ]s, e.g. {42,'foo',12.6}). Oh, append and pop don't exist for sets either, because there is no concept of "the back". So what can you do with a set if you don't have indices?
Important! The expression { } is *not* the literal for an empty set (it is actually an empty, "map", see below). Use set() to create an empty set. Important! So why the heck would anyone want to use sets! Three reasons: (1) no duplicates, (2) testing elements with the "in" operator is *much* more efficient than for lists, and (3) handy set operations like S.union(·), S.intersection(·), S.difference(·).

In your data structures class you will look at a few data structures that provide membership test functionality (i.e. "in") efficiently. One of these is a "hashtable", and that's how sets are implemented "under the hood" in Python.

ACTIVITY
If $N \gt 1$ is a positive integer, we can define an interesting sequence $X_N$ by \[ \begin{array}{l} x_0 = 1\\ x_{i+1} = (x_i^2 + x_i + 1) \bmod N \end{array} \] ... but at some point the sequence is bound to hit a repeat value. We define $\text{modsetsize}(N)$ to be the number of steps until you hit your first repetition (or, equivalently, the number of elements in the set of values in the sequence).

Example 1: for $N = 11$ sequence $X_{11} = 1,3,2,7,2,7,2,7,2,7,\ldots$. So modsetsize(11) = 4.

Example 2: for $N = 35$ sequence $X_{35} = 1,3,13,8,3,13,8,3,13,8,\ldots$. So modsetsize(35) = 4.

Define a function modsetsize(N) that calculates this value. Important! Use sets. You'll regret it if you don't. Also, use the in operator. You'll regret it if you don't.
>>> modsetsize(31)
8
>>> modsetsize(32)
16
>>> modsetsize(33)
4
>>> modsetsize(19873791)
292
>>> modsetsize(19873792)
335876

Python dictionaries (also called "maps")

The last of our basic builtin structures for collections of data is the python "dictionary". Generally speaking, it is a key/value store. The "value" is the data you want to store, and the "key" is an identifier you use to refer to that value. Bancroft Hall is a key/value store where "values" are mids and the "keys" are alpha codes. You will spend a whole lot of time in your data structures class learning how to implement efficient key/value stores (which we often call "map" in CS, or "dictionaries" when, as is often the case in Python, the key is a string.

Important! We will focus right now on dictionaries where the keys are strings.

Important! It is impossible to overstate how universal the use of dictionaries are in Python. They pop up everywhere!

ACTIVITY

FROM IC210/SI204 LAB
Part 1: Converting Currencies
Write a program (note: name the source code file part1.cpp) that converts between Dollars, Euros and Pounds. The program reads input from the user in the following format: Convert amount currency_1 to currency_2 and prints results in the obvious way. Here are a couple of sample runs:
~$ ./part1
Convert 3.50 Euros to Dollars

~$ ./part1
Convert 3.50 Euros to Pounds
Here are the conversion rates you'll need: 1.00 Dollar is Euros and 1.00 Dollar is Pounds.
Part 2: More Currencies
Extend your program from Part 1 (note: name the source code file part2.cpp) to allow for Canadian dollars as well (1.00 Dollars US is Dollars Canadian). Now the user can't simply put "Dollars" in the input, it must be either "Dollars US" or "Dollars Canadian". Here are a couple of sample runs:
~$ ./part2
Convert 3.50 Euros to Dollars Canadian

~$ ./part2
Convert 6.72 Dollars US to Dollars Canadian
To the right is part of a lab from IC210/SI204. You probably solved this problem with massive if-then-else statements. Let's see if we can find a better data-driven solution to this in Python.

Today, your job is to define a function convert(amt,curFrom,CurTo) that takes the amount (as a float) and the from currency and to currency (as strings) and returns the converted amount based on the currency names and conversion rates from the IC210/SI204 assignment. Here are some sample runs

>>> convert(26.75,'Euros','Dollars US')
29.662896429363492
>>> convert(300.50,'Dollars Canadian','Pounds')
173.91357555487988
>>> convert(25.75,'Pounds','Pounds')
25.750000000000004
>>> convert(84.50,'Dollars US','Euros')
76.2021 
Hint! I strongly suggest you make the following definition, which should make this function trivial to write:
cfactors = { 'Dollars US':1.0, 'Dollars Canadian':1.3156, 'Euros':0.9018, 'Pounds':0.7614 }    
Question: How hard would it be to augment your Python function by adding three more currencies ... like 'Krone', 'Yen' and 'Dollars Australian' along with the appropriate conversion rates? Think back to your IC210/SI204-style solution with if-else's ... how hard would it be to make the same additions to that code?

Christopher W Brown