Tutorial 11: Harvesting E-Mail Addresses and Other Evil Things (November 21-23, 2007)
Today's tutorial is all about generative recursion. We'll work with strings, graphs, and numbers. The starter code is here. All the material is from lecture module 11.
A spam bot is a computer program that crawls the internet looking for email addresses to add to a spam list. One crucial step is extracting (or 'harvesting') email addresses from some text with lots of other stuff in it (usually an html file). Note that most spam bots choose speed over thoroughness, because there are so many email addresses published freely that even a very stupid harvester can be quite effective. This is nice because it means that spam bots are usually easy to fool.
Write a simple email harvester called
To make this problem a bit shorter, you've been provided with two helper functions in the starter code.
Directed acyclic graphs (DAGs) can be used as a very simplistic model for the flow of information in a network. Each node in the graph represents a single computer in the network, and each line represents a one-way communication connection. The following questions focus on what happens when a single computer in the network is infected with a virus.
The code for the
The most insecure computers in a network are almost always the client machines. These can be recognized by having no out neighbors. In graph theory, these kind of nodes are called 'sink nodes'.
So a virus might be more likely to take hold if it is close to some client nodes. Write a function
A virus will have a greater chance of spreading to a given node if there are multiple (different) paths from the source to the target node. We'll say two paths are distinct as long as they are not completely identical (so they can share some nodes).
Write a function
We have seen in class that, when dealing with generative recursion, sometimes it is very difficult to determine if the algorithm always terminates (in fact, this is a famous 'undecidable' problem --- Google 'undecidable', 'Alan Turing', or 'halting problem' for more). A program that never terminates can also very easily become a security risk in a computer.
For the following programs, detemine whether they terminate on all valid imput. If so, give a brief justification or proof. If not, give an example input that will make the program run forever, and try to devise conditions on the input that will guarantee termination.