Homework 5

This is the archived website of SI 413 from the Fall 2013 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

# 1 Homographs and Synonyms

Pick a pair of two programming languages that you know, and come up with an example of each of the following. As always, you can work together, but everyone must turn in unique examples.

1. A homograph is a code fragment that is the same syntactically between the two languages, but has different semantics in each.

2. A synonym is a code fragment that is the same semantically between the two languages, but has different syntax.

# 2 Scanner DFA

C++ and Java support a few different kinds of numerical constants, or "literals". The most basic are regular ints that you know and love like 15, 256, or 32. There are also floating-point numbers like 3.7 or .0684.

These can be described with the following regular expressions:

  INT: [0-9]+
FLOAT: [0-9]*"."[0-9]*

(Remember, a + means "one or more of the previous thing", and a * means "zero or more of the previous thing". Square brackets [] indicate a range of characters (or multiple ranges), and anything in quotes "" means that literal character.)

Draw the DFA for the scanner that accepts FLOAT and INT tokens. Be sure to label each accepting state with the type of token, and put characters or character ranges on each transition.

# 3 Bigger Scanner DFA

Besides int and float literals, C++ and Java also support two kinds of numbers that you may not know about are different bases: to start an octal (base-8) number, you put a leading zero, so 034 and 0702 are octal numbers. And to write a number in hexidecmal (base 16), you start with "0x", like 0x3472 or 0xf3a.

Modify the DFA from the previous problem to incorporate these two new types of tokens. So all in all you should support:

Draw the DFA for the scanner that accepts the following tokens (as specified by regular expressions). Label accepting states with the name of the token.

OCTAL: 0[0-7]*
HEX: 0x[0-9a-f]+
INT: [0-9]+
FLOAT: [0-9]*"."[0-9]*

Observe that every octal literal like 034 could also count as an INT according to this specification. Give octal literals higher priority in your scanner, so that whenever there is a conflict, the token type will be OCTAL.

# 4 Modifying a Grammar

The following grammar is ambiguous (in terms of parsing), but it still defines a language:

Sexp
expexp OPA exp
expNUM

Write a new grammar for the same language which is unambiguous and would work well for top-down parsing. (In particular, your new grammar should be LL(1).)

# 5 Top-Down Parsing

Using your modified grammar from the previous problem, draw the parse tree that would be generated by a top-down parser for the following string:

53 + 89 - 103

# 6 PREDICT and FOLLOW

1. Give the PREDICT and FOLLOW sets for your modified grammar from two problems ago.

2. What part(s) of the sets above would be used to create a recursive-descent LL(1) parser?