Name: ____________________________________________________ Alpha: _____________________

Describe help received: _________________________________________________________________

To submit written work: take a clear picture of your work with your phone, and email it to me at wcbrown@usna.edu with the title: si413hw 08. The space between "si413hw" and "08" is really important!

Problem 1

The DFA we use for scanning essentially gets called over and over again. Suppose we have started processing from state 0 at character $c_i$ in the input stream. When we are in state $p$ having just read character $c_n$, and there is no transition out of $p$ for $c_n$, we have to back up to the last character we read in that brought us to an accepting state in the DFA, call it $c_j$. We emit $c_ic_{i+1}\cdots c_j$ as the next token, and when we resume processing, we will start at character $c_{j+1}$. This means $c_{j+1} c_{j+2}\cdots c_n$ must somehow be buffered somewhere as "putback". An important question is: How big do we need that buffer to be?

Theorem: Assuming we are operating under the principle of "maximal munch" ... If the underlying automaton has no loops that consist solely of non-accepting states, then there is some constant K such that no program requires more than K putback.

a. Prove this theorem.
b. Describe how to determine from the DFA the constant K that is the maximum amount of putback required.

Problem 2

Look at the file hwtok.cpp. Download it and compile and run like this:
> g++ -o hwtok hwtok.cpp
> ./tok
3*(-4 + 12/3);
NUM[3] OPM[*] LP[(] OPA[-] NUM[4] OPA[+] NUM[12] OPM[/] NUM[3] RP[)] STOP[;]
Some changes to a language can be handled completely by the scanner. Your job is to modify hwtok.cpp so it can accept integers input in either decimal or binary. Specifically, a string of 0's and 1's followed immediately by a "b", no whitespace or anything, is an integer literal in binary. So, for example:
> ./hwtok
10 + 3;
NUM[10] OPA[+] NUM[3] STOP[;]
101b + 3;
NUM[5] OPA[+] NUM[3] STOP[;]
a. Draw a modified version of the finite automaton from the lecture notes that takes binary integer constants into account. (You only need to show the parts that change.)
b. Modify hwtok.cpp to implement this feature. You may use this function binary2int defined below:
// Input:  val - a string consisting of the form (0|1)+b , e.g. 1011b 
// Output: the integer value equal to val (note: no error checking is done!)
int binary2int(const string &val)
{
  int x = 0;
  for(int i = 0; val[i] != 'b'; ++i)
    x = 2*x + (val[i]-'0');
  return x;
}
Submit as: submit-external -c=SI413 -p=hw08 hwtok.cpp