**Reading**-
Section 3.1 of
*Theory of Computing, A Gentle Introduction*. Please read it! **Homework**- Then printout the homework and answer the questions on that paper.

**Moving on from regular languages**- We saw in the preceding classes that regular languages are surprisingly important, despite the simplicity of DFAs/NDFAs/RegExps. There are many languages that are clearly important to us (like properly formatted decimal numbers) that in fact regular. However, we also proved that there are many languages that are fundamental to computing that are not regular --- i.e. DFAs/NDFAs/RegExps are not powerful enough to capture. For instance, we proved that with regular languages we can't even require that parentheses match up in a simple language for arithmetic expressions. So finite automata (and regular expressions) are not powerful enough even for the core activities of compiling and interpreting programs. This motivates us to consider new, more powerful formalisms. First this will mean "context free grammars", and a bit later this will mean pushdown automata.
**Context Free Grammars: Informally**-
A
*context free grammar*is a set of "rewrite rules" that say that a particular string can be "rewritten" by substituting particular expressions in place of particular symbols. A simple example: We may have a*rule*stating that the character*A*can be replaced with the string*aaA*(which we'll write as*A → aaA*). With this rule, we can "rewrite" the string*babAb*as*babaaAb*. We distinguish between symbols for which there are no substitution rules (called*terminals*) and those for which there are subsitution rules (called*nonterminals*). The goal in using these rules is to produce a string of terminals - after all, at that point no further substitutions are possible.**Example:**Given the rules*A → aaA*and*A → b*we can produce the string*aaaab*from the string*A*as follows:- start with
*A* - apply the first rule and get
*aaA* - apply the first rule and get
*aaaaA* - apply the second rule and get
*aaaab*

*derived*string*aaaab*from the string*A*.*grammar*. When all rules have the form*X → w*, where*X*is a single nonterminal, the grammar is called*Context Free*.**Example:**The following rules, with nonterminal*S*designated as the start symbol, define the language of all even-length strings over the alphabet*{a,b}*.*S → A**A → aaA**A → abA**A → baA**A → bbA**A → λ*

*abaabb*, for example, would be*S ⇒ A ⇒ abA ⇒ abaaA ⇒ abaabbA ⇒ abaabb***Example:**Consider the following grammar:*S → A**A → A,A**A → c*

*c,c,c*can be derived in several different ways:*S ⇒ A ⇒ A,A ⇒ c,A ⇒ c,A,A ⇒ c,c,A ⇒ c,c,c*

*S ⇒ A ⇒ A,A ⇒ A,A,A ⇒ c,A,A ⇒ c,c,A ⇒ c,c,c*

*S ⇒ A ⇒ A,A ⇒ A,c ⇒ A,A,c ⇒ c,A,c ⇒ c,c,c*

**Problem:**Consider the following grammar:*S → cAc**A → AcA**A → aa**A → bb*

*caacbbcbbc*and*cbbcaacaacaac*. - start with
**Languages defined by Context Free Grammars**-
CFGs are kind of like regular expressions, in the sense that
they are a nice way to specify a language, as opposed to
testing whether a string belongs to a language. So CFGs are
similar to regular expressions, but how do they compare?
Can they express everything that regular exrpessions can?
**Excercise:**Here's a list of some of the languages defined by regular expressions. See if you can write a CFG generating any of them.*(a|b)**, i.e. any string over*{a,b}**a(a|b)*b*, i.e. any string that starts with*a*and ends with*b**abb|bba*, i.e.*{abb,bba}*

**Excercise:**Here's a list of some of the languages we proved are not regular, i.e. cannot be accepted by finite automata. See if you can write a CFG generating any of them.*{a*^{n}b^{n}| n ≥ 0}*{a*^{n}b^{n}c^{n}| n ≥ 0}- The language of properly parenthesized lists of
*a*'s - The language of palindromes over
*{a,b,c}* - The language of all strings of
*a*'s whose length is prime

**A formal definition of a CFG**-
Just as with DFAs and NDFAs we start with an informal
definition of CFG, but we eventually need a formal definition
so that we may begin to write algorithms and ultimately proofs.
Formally:
**Definition:**A*Context Free Grammar*is a 4-tuple*G = (Σ,NT,R,S)*, where- Σ is the alphabet, i.e. the set of terminal symbols
*NT*is the set of nonterminal symbols*R ⊆ NT × (Σ ∪ NT)**, i.e. the elements of*R*are pairs whose first componant is an element of*NT*and whose second componant is a string over the alphabet*Σ ∪ NT**S ∈ NT*is the start symbol

Christopher W Brown Last modified: Fri Oct 23 09:18:44 EDT 2009