Sections 2.2 of Theory of Computing, A Gentle Introduction
Printout the homework and answer the questions on that paper.

A Mystery Algorithm on Nondeterministic finite automata
Consider the Mystery Algorithm below:
Input: NDFAs M1 = (Q1,Σ1,Δ1,s1,W1) and M2 = (Q2,Σ2,Δ2,s2,W2), where Q1 ∩ Q2 = ∅
Output: NDFA M such that ... <it's a mystery!>
M = (Q1 ∪ Q2 ∪ {s}, Σ1 ∪ Σ2, Δ1 ∪ Δ2 ∪ {(s,λ,s1),(s,λ,s2)}, s, W1 ∪ W2), where s ∉ Q1 and s ∉ Q2 (i.e. s is a new state)
Here are a couple of questions to answer about this mystery algorithm:
  1. What 5-tuple does this algorithm produce for the input machines given below?

    M1 = ({p0,p1}, {a,b}, {(p0,a,p1),(p1,a,p0)}, p0, {p1}) and M2 = ({q0,q1}, {a,b}, {(q0,b,q1),(q1,b,q0)}, q0, {q0})

  2. Draw the diagram for your answer to part 1.

  3. Give a concise english description of what the Mystery Algorithm accomplishes in general (i.e. for any two input machines).

From the above excercise you (hopefully!) figured out that the Mystery Algorithm is actually the Union Algorithm, in other words: Given input machines M1 and M2 it produces a machine M such that L(M) = L(M1) ∪ L(M2). So, now we've proved something about languages accepted by nondeterministic finite automata.

Theorem: If L1 and L2 are languages accepted by nondeterministic finite automata, then L1 ∪ L2 is also accepted by a nondeterministic finite automaton.

Concatenation of Languages and a review of set constructor notation
Often we're interested in strings that can be decomposed in certain ways. For example, verbs in english can often be broken up into a root and an ending, like talks or warned or leans. Given the set of roots R = {talk, warn, lean} and the set of endings E = {s, ed}, our possiblities are P = {talks, talked, warns, warned, leans, leaned}. When we form a new set of strings this way, we say that P is the contatenation of R and E. Languages, remember, are simply sets of strings, so we talk about contatenating two languages to form a new language. Each string in the new language is the contatenation of a string from the first language with a string from the second language.

To give a formal definition of the concatenation of two languages we need to recall our set constructor notation, which looks like { A | B }, where the { means "the set of all" and the | means "such that".

Definition: For languages L1, L2 ⊆ Σ*, the contatenation of L1 and L2 (denoted L1L2) is { xy ∈ Σ* | x ∈ L1 and y ∈ L2}. In other words, L1L2 is the set of all strings of the form xy such that x ∈ L1 and y ∈ L2.
So, for example, if L1 = {aa, ba, c} and L2 = {cc, cb, c}, L1L2 = {aacc, aacb, aac, bacc, bacb, bac, ccc, ccb, cc}. Concatenation of infinite lanugages also makes sense with our definition. For example, if L1 is the language of strings of a's of even length and L2 = {bb, cc}, then:
L1L2 = {bb, cc, aabb, aacc, aaaabb, aaaacc, aaaaaabb, aaaaaacc, ... }
Can you see how bb and cc ended up in the concatenation of L1 and L2? Well, λ (the empty string) satisfies our definition of L1. So λ concatenated with bb gives bb, and λ concatenated with cc gives cc.

A Concatenation Algorithm
Without having defined the concatenation of languages, we looked at examples of tying two machines M1 and M2 together to produce a machine for L(M1)L(M2). Now, can you define an algorithm that does this in general, i.e. for any two machines? So, I want an algorithm that matches the following specification:
Input: NDFAs M1 = (Q1,Σ1,Δ1,s1,W1) and M2 = (Q2,Σ2,Δ2,s2,W2), where Q1 ∩ Q2 = ∅
Output: NDFA M such that L(M) = L(M1)L(M2)
If you think back to specific examples from last lecture, you hopefully see the general strategy. Add λ-transitions from the accepting states of M1 to the start state of M2. The start state of the new machine will be M1's start state, and the accepting states are the accepting states of M2. We can describe this formally as:
Algorithm: Concatenate
Input: NDFAs M1 = (Q1,Σ1,Δ1,s1,W1) and M2 = (Q2,Σ2,Δ2,s2,W2), where Q1 ∩ Q2 = ∅
Output: NDFA M such that L(M) = L(M1)L(M2)
M = (Q1 ∪ Q2, Σ1 ∪ Σ2, Δ, s1,W2), where Δ = Δ1 ∪ Δ2 ∪ {(q,λ,s2) | q ∈ W1}
My claim is that L(M) = L(M1)L(M2). Following our previous discussion, we'll assume this claim is self-evident enough from our experience with this algorithm. To be strictly, formally correct, however, we'll have to prove it, which I do below. If you accept my claim, then we have the following thoerem:
Theorem: If L1 and L2 are languages that are accepted by nondeterministic finite automata, then L1L2 is
Proof that Algorithm Concatenate meets its specification, i.e. L(M) = L(M1)L(M2).
[Note: this is for completeness. I suspect that the correctness of the algorithm is sufficiently self-evident for most of us.]
  1. Show if w ∈ L(M) then w ∈ L(M1)L(M2)
    Since w ∈ L(M), we know that $(s1,w) \Rightarrow_{M}^* (q,\lambda)$ where $q \in W2$. We note that in $M$, the only transitions that involve states from both $Q1$ and $Q2$ are from $\{(q,\lambda,s2)| q \in W1\}$. This means that \[ (s1,w) \Rightarrow_{M1}^* (p,v) \Rightarrow_M (s2,v) \Rightarrow_{M2}^* (q,\lambda) \] where $(p,v) \Rightarrow_M (s2,v)$ represents one of the $\lambda$-transitions from $\{(q,\lambda,s2)| q \in W1\}$, which means that $p \in W1$. So we have \[ (s1,w) \Rightarrow_{M1}^* (p,v) \text{, where $p \in W1$, and } (s2,v) \Rightarrow_{M2}^* (q,\lambda) \text{, where $q \in W2$} \] So, $w = uv$ for some $u \in \Sigma_1^*$, and $(s1,uv) \Rightarrow_{M1}^* (p,v)$ implies $(s1,u) \Rightarrow_{M1}^* (p,\lambda)$. Therefore, $u \in L(M1)$ and $v \in L(M2)$ which proves that $w \in L(M1)L(M2)$.
  2. Show if w ∈ L(M1)L(M2) then w ∈ L(M)
    Since w ∈ L(M1)L(M2) we have that $w = uv$ such that $u \in \Sigma_1$ and $(s1,u) \Rightarrow_{M1}^* (p,\lambda)$, where $p \in W1$, and $(s2,v) \Rightarrow_{M2}^* (q,\lambda)$, where $q \in W2$. Note that: \[ (s1,u) \Rightarrow_{M1}^* (p,\lambda) \text{ means } (s1,uv) \Rightarrow_{M}^* (p,v) \text{, and that } (s2,v) \Rightarrow_{M2}^* (q,\lambda) \text{ means } (s2,v) \Rightarrow_{M}^* (q,\lambda). \] These follow because $M$ inherits all the same transitions that $M1$ and $M2$ had. Glue these together with transition $(p,\lambda,s2)$ and we get \[ (s1,uv) \Rightarrow_{M}^* (p,v) \Rightarrow_M (s2,v) \Rightarrow_{M}^* (q,\lambda) \] where $q \in W2$, and thus $w \in L(M)$.
Kleene Star *
One is often interested in strings that are formed by repeatedly concatenating strings from a given language. For example, if you wanted a string of doubled characters, e.g. aabbbbaacc, what you're looking for is a string that's made up of repeated copies of elements of the language {aa,bb,cc}. The operation of "repeated concatenation" from a given language is called the Kleene Star of a language. Somewhat informally, we'll say that the Kleene Star of language L (denoted L*) is a string of the form x1x2 ... xn-1xn, where each of the xi's is in L. Typically, you see this defined in a very different way ... a way that is particularly conducive to proof by induction.
1) λ ∈ L*
2) if x ∈ L* then xy ∈ L* for every y ∈ L
A definition like this is called a recursive definition, and it is basically a set of rules generating all the elements of L*. A few examples will, hopefully, make Kleene Star clear to you:

So, the Kleene Star of a language is the set of all strings consisting of zero or more elements of the language concatenated together.

An algorithm for Kleene star
Hopefully you can see that an algorithm that takes a machine M and produces a machine accepting L(M)* is not too tricky ... although it might be a tad trickier than you thought, because you need to accept the empty string, even if the original machine doesn't. One way to do this is to have λ-transitions from the final states of the original machine back to its start state. This way, we can run the machine repeatedly while processing a string. Of course we won't necessarily accept the empty string with this scheme. Just making the initial state accepting doesn't work, though, because it might lead you to accept some strings you shouldn't. Instead, we add an λ-transition to a new state, and that will be our final state. See if you can produce this algorithm on your own before looking at my solution.
Algorithm Kleene Star
Input: NDFA M = (Q, Σ, Δ, s, W)
Output: NDFA M' such that L(M') = L(M)* \[ M' = (Q \cup \{$\}, \Sigma, \Delta \cup \{(\$,\lambda,s)\} \cup \{(q,\lambda,\$) | q \in W\}, \$, \{\$\}), \] where $ ∉ Q, i.e. $ is a new state
Now, I claim that L(M') = L(M)*. I really ought to prove it, of course. However, for the moment why don't you just try it on an example or two and convince yourself that it really works.

What do we know about NonDeterministic Finite Automata?
So what do we know about languages accepted by NonDeterministic finite automata? We know that:
  1. the union of two languages accepted by nondeterministic finite automata is accepted by a nondeterministic finite automaton
  2. the concatenation of two languages accepted by nondeterministic finite automata is accepted by a nondeterministic finite automaton
  3. the Kleene Star of a language that's accepted by an automaton is accepted by an automaton
How does this compare to deterministic finite automata? We don't about contatenation or Kleene star for deterministic finite automata. And while for deterministic machines we know about intersection and complement, we have no idea about those for nondeterministic machines. [Question: Can you prove that the complement algorithm we proposed for deterministic machines doesn't work for non-deterministic machines? I.e. if you take NDFA M and create a new machine M' by making all accepting states non-accepting and all non-accepting states accepting, can you see why it's not necessarily true that L(M') = L(M)?] So, clearly there are some gaps in our understanding of both kinds of machines. The one thing I know for sure, however, is that if a language can be accepted by a deterministic finite automaton it can also be accepted by a nondeterministic finite automaton. Can you see why?

Christopher W Brown
Last modified: Fri Sep 18 09:42:55 EDT 2009