Context Free Grammars: Informally
A
context free grammar is a set of "rewrite rules"
that say that a particular string can be "rewritten" by
substituting particular expressions in place of particular
symbols. A simple example: We may have a
rule
stating that the character
A can be replaced with the
string
aaA (which we'll write as
A → aaA).
With this rule, we can "rewrite" the string
babAb as
babaaAb. We distinguish between symbols for which
there are no substitution rules (called
terminals) and
those for which there are subsitution rules (called
nonterminals). The goal in using these rules is to
produce a string of terminals - after all, at that point no
further substitutions are possible.
Example:
Given the rules A → aaA and A → b
we can produce the string aaaab from the string
A as follows:
- start with A
- apply the first rule and get aaA
- apply the first rule and get aaaaA
- apply the second rule and get aaaab
We say that we have derived string aaaab from the string
A.
A collection of these rules along with a nonterminal
designated as the string with which derivations begin is a
grammar.
When all rules have the form
X → w, where
X is a single nonterminal, the grammar is called
Context Free.
Example: The following rules, with nonterminal
S designated as the start symbol, define the
language of all even-length strings over the alphabet
{a,b}.
- S → A
- A → aaA
- A → abA
- A → baA
- A → bbA
- A → λ
A derivation of abaabb, for example, would be
S ⇒ A ⇒ abA ⇒ abaaA ⇒ abaabbA ⇒ abaabb
where the "⇒" denotes the application of a single
grammar rule.
Life gets substantially more interesting when grammar
derivations produce strings containing several nonterminals,
and during the derivation process you have many choices as to
which rules to apply.
Example: Consider the following grammar:
In derivations for this grammar we have decisions - decisions
about which non-terminal to replace as well as which rule to
use in the replacement. For example, the string c,c,c
can be derived in several different ways:
S ⇒ A ⇒ A,A ⇒ c,A ⇒ c,A,A ⇒ c,c,A ⇒ c,c,c
S ⇒ A ⇒ A,A ⇒ A,A,A ⇒ c,A,A ⇒ c,c,A ⇒ c,c,c
S ⇒ A ⇒ A,A ⇒ A,c ⇒ A,A,c ⇒ c,A,c ⇒ c,c,c
Activity: Consider the following grammar:
- S → cAc
- A → AcA
- A → aa
- A → bb
Find derivations for the strings caacbbcbbc
and cbbcaacaacaac.
Languages defined by Context Free Grammars
CFGs are kind of like regular expressions, in the sense that
they are a nice way to specify a language, as opposed to
testing whether a string belongs to a language. So CFGs are
similar to regular expressions, but how do they compare?
Can they express everything that regular exrpessions can?
Excercise:
Here's a list of some of the languages defined by regular
expressions.
See if you can write a CFG generating any of them.
- (a|b)*, i.e. any string over {a,b}
- a(a|b)*b, i.e. any string that starts with
a and ends with b
- abb|bba, i.e. {abb,bba}
Can
they express languages that cannot be expressed by regular
expressions?
Excercise:
Here's a list of some of the languages we proved are not
regular, i.e. cannot be accepted by finite automata.
See if you can write a CFG generating any of them.
- {anbn | n ≥ 0}
- {anbncn | n ≥ 0}
- The language of properly parenthesized lists of a's
- The language of palindromes over {a,b,c}
- The language of all strings of a's whose
length is prime