definition of regular expression definition of language defined by a regular expression Definition: A regular expression over the alphabet Σ is an expression derived from the following rules:
- λ, ∅, and a, where a ∈ Σ, are all regular expressions
- if E is a regular expression then (E)* is a regular expression
- if E1 and E2 are regular expressions then (E1)(E2) and (E1)|(E2) are regular expressions
- nothing else is a regular expression!
Definition: If R is a regular expression over alphabet Σ, then L(R), the language defined by regular expression R, is
- {λ} if R = λ, { } if R = ∅, and {a} if R = a, where a ∈ Σ
- L(E)* if R = (E)* for some regular expression E
- L(E1)L(E2) (respectively L(E1) ∪ L(E2)) if R = (E1)(E2) (respectively R = (E1)|(E2)) whereE1 and E2 are regular expressions
| a ∈ Σ Machine | λ Machine | ∅ Machine |
|
Input: a ∈ Σ Output: M = ({q0,q1},Σ,{(q0,a,q1)},q0,{q1}) |
Input: nothing Output: M = ({q0},Σ,{ },q0,{q0}) |
Input: nothing Output: M = ({q0},Σ,{ },q0,{ }) |
![]() |
![]() |
![]() |
Now we really have everything we need for a recursive algorithm to convert a regular expression λ into a finite automaton.
Algorithm: RE2FASo now we have an algorithm for converting regular expressions into finite automata. So what? Why do we care? Well we care because this is precisely what a utility like grep or Perl's regular expression facilities has to do in order to do pattern matching. They start with a user specified regular expression, convert to an NDFA, convert to a DFA, and convert that into a program. This program is what gets run on the input string.
Input: regular expression E
Output: NDFA M such that L(M) = L(E).
if E is a, λ or ∅, return the a ∈ Σ Machine, λ Machine or ∅ Machine from above
else if E = E1 | E2 return Union(RE2FA(E1),RE2FA(E2))
else if E = E1E2 return Concatenation(RE2FA(E1),RE2FA(E2))
else if E = E1* return KleeneStar(RE2FA(E1))
The algorithm for this conversion is a little ugly. I'm not going to cover it in any great depth. The point is that it is possible ... it's not the kind of thing there's a great need to do in practice.
Algorithm: TableUpdate
Input: NDFA $M = (Q,\Sigma,\Delta,s,W)$ containing no λ-transitions, $(q_a,x,q_b) \in Q\times\Sigma\times Q - \Delta$, and a table $E$ such that $E_{ij}$ is a regular expression for $\{w \in \Sigma\ |\ (q_i,w) \Rightarrow_M^* (q_j,\lambda)\}$. [Note: for convenience call the states $\{q_0,q_1,\ldots,q_{n-1}\}$]
Output: A table $E^{\text{new}}$ such that $E^{\text{new}}_{ij}$ is a regular expression for $\{w \in \Sigma\ |\ (q_i,w) \Rightarrow_{M^{\text{new}}}^* (q_j,\lambda)\}$, where $M^{\text{new}} = (Q,\Sigma,\Delta\cup\{(q_a,x,q_b)\},s,W)$
- $E^{\text{new}}_{ab} := (E_{aa})(x(E_{ba}))^*(E_{ab}|x)(E_{bb})$
- for all $(r,s) \neq (a,b)$ do
$\ \ \ \ E^{\text{new}}_{rs} := E_{rs} | (E_{ra})(E^{\text{new}}_{ab})(E_{bs})$
We use this algorithm to build a table T such that T[p,q] is a regular expression defining the set of all strings that take you from state p to state q in the machine, i.e. all strings w such that (p,w)⇒*(q,λ). We build this table incrementally, by starting with a machine that is the same as the input, except all the transitions are missing. The table for this initial machine is clearly λs along the diagonal, i.e. T[p,p] = λ for each state p, and ∅ everywhere else, since there are no transitions to take you from one state to another. We add transitions one-by-one, updating the table along the way by the algorithm above. When all the transitions have been added, we create the final regular expression by taking the "or" of all table entries T[p,q] where p is the start state and q is an accepting state.
This algorithm works, but it creates horribly complicated regular expressions. I've implemented it, adding a little bit of simplification on the regular expressions that appear in the table. Even with that, the results are ugly. But, you do really get a regular expression at the end of the day that is equivalent to the input machine. The figure below shows the algorithm operating on input machine:
![]() |
|
|||||||||
![]() |
|
|||||||||
![]() |
|
|||||||||
![]() |
|
|||||||||
![]() |
|
To get our final answer, we note that since state 1 is the only accepting state, T[0,1] is our answer, i.e.
(a|b)(a(a|b))*a(a|b)|a|b|(((a|b)(a(a|b))*a(a|b)|a|b)(((a(a|b))*a(a|b)|λ)(b((a|b)(a(a|b))*a(a|b)|a|b))*(((a(a|b))*a)|b)((a|b)(a(a|b))*a|λ))((a|b)(a(a|b))*a(a|b)|a|b))When you consider that (a|b)*(a|b) is also an equivalent regular expression, that looks pretty ugly. It is, however, equivalent. I actually checked!
Now here is finally a case where you're looking at an algorithm and probably thinking "why would I believe this works?" It's not clear that the table has the desired property, i.e. that T[p,q] define { w ∈ Σ*| (p,w) ⇒ (q,λ)}. Really we need to prove that in the algorithm described above the following is true:
if E is a table such that Eij is a regular expression for {w ∈ Σ* | (qi,w) ⇒*M (qj,λ)} then Enew is a table such that Enewij is a regular expression for {w ∈ Σ* | (qi,w)⇒*Mnew(qj,λ)} where Mnew = (Q,Σ,Δ∪{(pa,x,pb)},s,W)}.This is not too bad to prove, but it requires the concept of inductive proof, which not all of you have seen at this point.
To wrap this all up, we need to show how to use TableUpdate to produce a regular expression that is equivalent to a given NDFA. The idea is simple: strip away all the transitions from the machine, then add them back one-by-one, calling TableUpdate at each step in order to keep the table in line with the machine. When we're done, we have a table such that $E_{ij}$ is a reg. ex. for the language of all strings that get me from state $q_i$ to $q_j$. Of course the accepted strings are those that get me from the start state $q_s$ to any accepting state. I need to "or" together these REs for each of the accepting states.
Algorithm: NDFA-to-RE
Input: NDFA $M_{in} = (\{q_1,\ldots,q_n\},\Sigma,\{t_1,\ldots,t_m\},q_s,\{q_{a1},\ldots,q_{ak}\})$ containing no λ-transitions
Output: regular expression $exp$ such that $L(M) = L(exp)$
- $E := $ the $n\times n$ table with $\lambda$'s along the main diagonal, and $\emptyset$'s everywhere else
- $M := (\{q_1,\ldots,q_n\},\Sigma,\{\ \},q_s,\{q_{a1},\ldots,q_{ak}\})$
- for $i$ from $1$ to $m$ do
- $E := \text{TableUpdate}(M,t_i,E)$
- $M := (\{q_1,\ldots,q_n\},\Sigma,\{t_1,\ldots,t_i\},q_s,\{q_{a1},\ldots,q_{ak}\})$
- $exp := E_{q_s q_{a1}}\ |\ E_{q_s q_{a2}}\ |\ \cdots \ |\ E_{q_s q_{ak}}$