Reading
Section 3.1 of Theory of Computing, A Gentle Introduction. Please read it!
Homework
Then printout the homework and answer the questions on that paper.

Moving on from regular languages
We saw in the preceding classes that regular languages are surprisingly important, despite the simplicity of DFAs/NDFAs/RegExps. There are many languages that are clearly important to us (like properly formatted decimal numbers) that in fact regular. However, we also proved that there are many languages that are fundamental to computing that are not regular --- i.e. DFAs/NDFAs/RegExps are not powerful enough to capture. For instance, we proved that with regular languages we can't even require that parentheses match up in a simple language for arithmetic expressions. So finite automata (and regular expressions) are not powerful enough even for the core activities of compiling and interpreting programs. This motivates us to consider new, more powerful formalisms. First this will mean "context free grammars", and a bit later this will mean pushdown automata.
Context Free Grammars: Informally
A context free grammar is a set of "rewrite rules" that say that a particular string can be "rewritten" by substituting particular expressions in place of particular symbols. A simple example: We may have a rule stating that the character A can be replaced with the string aaA (which we'll write as A → aaA). With this rule, we can "rewrite" the string babAb as babaaAb. We distinguish between symbols for which there are no substitution rules (called terminals) and those for which there are subsitution rules (called nonterminals). The goal in using these rules is to produce a string of terminals - after all, at that point no further substitutions are possible.
Example: Given the rules A → aaA and A → b we can produce the string aaaab from the string A as follows:
  1. start with A
  2. apply the first rule and get aaA
  3. apply the first rule and get aaaaA
  4. apply the second rule and get aaaab
We say that we have derived string aaaab from the string A.
A collection of these rules along with a nonterminal designated as the string with which derivations begin is a grammar. When all rules have the form X → w, where X is a single nonterminal, the grammar is called Context Free.
Example: The following rules, with nonterminal S designated as the start symbol, define the language of all even-length strings over the alphabet {a,b}.
  • S → A
  • A → aaA
  • A → abA
  • A → baA
  • A → bbA
  • A → λ
A derivation of abaabb, for example, would be
S ⇒ A ⇒ abA ⇒ abaaA ⇒ abaabbA ⇒ abaabb
where the "⇒" denotes the application of a single grammar rule.
Life gets substantially more interesting when grammar derivations produce strings containing several nonterminals, and during the derivation process you have many choices as to which rules to apply.
Example: Consider the following grammar:
  • S → A
  • A → A,A
  • A → c
In derivations for this grammar we have decisions - decisions about which non-terminal to replace as well as which rule to use in the replacement. For example, the string c,c,c can be derived in several different ways:
S ⇒ A ⇒ A,A ⇒ c,A ⇒ c,A,A ⇒ c,c,A ⇒ c,c,c
S ⇒ A ⇒ A,A ⇒ A,A,A ⇒ c,A,A ⇒ c,c,A ⇒ c,c,c
S ⇒ A ⇒ A,A ⇒ A,c ⇒ A,A,c ⇒ c,A,c ⇒ c,c,c
Problem: Consider the following grammar:
  • S → cAc
  • A → AcA
  • A → aa
  • A → bb
Find derivations for the strings caacbbcbbc and cbbcaacaacaac.
Languages defined by Context Free Grammars
CFGs are kind of like regular expressions, in the sense that they are a nice way to specify a language, as opposed to testing whether a string belongs to a language. So CFGs are similar to regular expressions, but how do they compare? Can they express everything that regular exrpessions can?
Excercise: Here's a list of some of the languages defined by regular expressions. See if you can write a CFG generating any of them.
  1. (a|b)*, i.e. any string over {a,b}
  2. a(a|b)*b, i.e. any string that starts with a and ends with b
  3. abb|bba, i.e. {abb,bba}
Can they express languages that cannot be expressed by regular expressions?
Excercise: Here's a list of some of the languages we proved are not regular, i.e. cannot be accepted by finite automata. See if you can write a CFG generating any of them.
  1. {anbn | n ≥ 0}
  2. {anbncn | n ≥ 0}
  3. The language of properly parenthesized lists of a's
  4. The language of palindromes over {a,b,c}
  5. The language of all strings of a's whose length is prime
A formal definition of a CFG
Just as with DFAs and NDFAs we start with an informal definition of CFG, but we eventually need a formal definition so that we may begin to write algorithms and ultimately proofs. Formally:
Definition: A Context Free Grammar is a 4-tuple G = (Σ,NT,R,S), where
  • Σ is the alphabet, i.e. the set of terminal symbols
  • NT is the set of nonterminal symbols
  • R ⊆ NT × (Σ ∪ NT)*, i.e. the elements of R are pairs whose first componant is an element of NT and whose second componant is a string over the alphabet Σ ∪ NT
  • S ∈ NT is the start symbol


Christopher W Brown
Last modified: Fri Oct 23 09:18:44 EDT 2009