SI342: What if we did FAs differently?

Reading: Just these notes.
Homework: Have a little pre-Thanksgiving break!

An example of why the concept of "configuration" makes TMs easier to talk about (optional)

We had an in-class problem that asked you to create a TM that "swaps all the a's and b's in the input and leaves the read/write head pointing to the first character in the string". I had a mid (Afable, 2019) once send me the (very clever) solution for this problem shown to the right, and ask the question "Is this machine the simplest machine that solves the problem?"

Let's take "simplest" to mean fewest states. What would it take to prove that this is indeed the simplest machine solving the problem? We need to show that any machine with fewer states fails! This is not so dissimilar to the task that faced us with our proofs that various languages are not regular. We need an algorithm that takes a Turing machine with fewer states, and produces a counter example, i.e. an input for which that machine fails to perform as required.

Note that our formal definition of a TM requires that every machine have at least two states: the halt state, and a distinct start state. Since Mr. Afable's machine has three states, the only machines with fewer states are those with two states: the halt state "h" and another state, which we'll call q. So here's my algorithm:

Input: TM $M = (Q,\Sigma,\Gamma,\delta,s)$, where $|Q| = 1$
Output: string $w \in \Sigma$ such that $M$ does not meet the above specification on input $w$
if $M$ fails to meet the above specification on input $aa$
then $w = aa$
else $w = aaa$

Of course now I need to prove that this algorithm works, i.e. that it always produces our counter example. Clearly, if the machine fails on input $aa$, we output $w = aa$ and we've found our counter example. What about when the machine operates successfully on $aa$? This means: \[ (q,\lambda,a,a) \Rightarrow_M^* (h,\lambda,b,b) \] Now, notice that at some point we must move the tape head from the first cell to the second cell (in order to change it to b), and if we look at the configuration the very first time the r/w head is pointing to the second cell, it must look like $(q,x,a,\lambda)$, where $x$ is element of $\Gamma$ or □. Since this is the first time the r/w head is pointing to that cell, we know that it contains a, and we know there is nothing after it on the tape. Thus, we get: \[ (q,\lambda,a,a) \Rightarrow_M^* (q,x,a,\lambda) \Rightarrow_M^* (h,\lambda,b,b) \] Crucially, we get that $(q,x,a,\lambda) \Rightarrow_M^* (h,\lambda,b,b)$ and that during this process the machine never moves left from the square with x, since that would cause a crash. So if we have input $aaa$, then we must have \[ (q,\lambda,a,aa) \Rightarrow_M^* (q,x,a,a) \] because the $M$ can't operate any differently on input $aaa$ than on $aa$ until it moves the tape head to the third cell. However, since $(q,\lambda,a,a) \Rightarrow_M^* (h,\lambda,b,b)$ we must have $(q,x,a,a) \Rightarrow_M^* (h,x,b,b)$. Put this all together, and we get \[ (q,\lambda,a,aa) \Rightarrow_M^* (q,x,a,a) \Rightarrow_M^* (h,x,b,b) \] where $x \neq \lambda$, which means that $M$ halts with the r/w head on the wrong square (not the first), and thus hasn't operated as required.

Here's what you should take away from this: We really require the notion of configuration in order to reason about the computation of Turing machines.

Looking back at FAs: what if we'd done it differently?

Suppose we defined NDFAs slightly differently. In particular, what if we allowed arbitrary strings as labels on transitions, instead of allowing only single characters or $\lambda$? This way, instead of the NDFA

We could construct the new machine (we'll call it an NDFA' )

Would that model be more powerful? Well, first we would have to redefine all of our usual NDFA notions for the new NDFA' model:

The defition of an NDFA' would be $(Q,\Sigma,\Delta,s,W)$, like with NDFAs, but we'd have $\Delta \subset Q\times\Sigma^*\times Q$.
The definition of configuration would stay the same, e.g. $(q,w)$ where $q$ is the state the machine is in, and $w$ is what remains of the input left to read.
Yields in one step (aka $\Rightarrow_M$) would change slightly to $(q,w) \Rightarrow_M (q',w')$ if $w = u w'$, where $u \in \Sigma^*$, and $(q,u,q') \in \Delta$.
Yields in many steps (aka $\Rightarrow_M^*$) would stay the same.
The definition of accept would stay the same.

Clearly any language that is accepted by an NDFA is also accepted by an NDFA', since any NDFA already fits the definition of an NDFA'. But what about the other way around? How could we prove that any language accepted by an NDFA' prime is also accepted by a regular old NDFA? This would show that the new NDFA' is no more powerful than the original NDFA.

We need an algorithm that maps a NDFA$'$ to an equivalent NDFA. In some sense, we will "simulate" the NDFA$'$ with a regular old NDFA!

Algorithm I:
Input: NDFA$'$ $M' = (Q',\Sigma,\Delta',s,W)$
Output: NDFA $M$ such that $L(M') = L(M)$

let $(p_{0},u_{0},q_{0}),\ldots (p_{r},u_{r},q_{r})$ be the transitions in $\Delta'$ for which the length of the string label on the transition is greater than one.

$M = (Q,\Sigma,\Delta,s,W)$, where

$Q = Q' \cup \{q_{i,j} \mid q_i \in Q' \text{ and } 1 \leq j \leq |u_i| - 1\}$

$\Delta = \Delta' - \{(p_{0},u_{0},q_{0}),\ldots (p_{r},u_{r},q_{r})\}$

for each transition $(p_i,u_i,q_i)$ from the above list, let $x_1 x_2 \cdots x_k$ be $u_i$ written out character by character \[\Delta = \Delta \cup \{(p_i,x_1,q_{i,1}), (q_{i,1},x_2,q_{i,2}), \ldots (q_{i,k-1},x_{k-1},q_{i,k-1}), (q_{i,k-1},x_k,q_i)\}\]

Of course now we have to prove that this algorithm works! The idea behind this proof (and indeed behind any proof that one kind of machine "simulates" another) is that accepting computations for $M'$ and accepting computations for $M$ basically "match up", so that one gives the other. We'll look at an example of how this works rather than going through the proof. Suppose the input machine NDFA$'$ $M'$ is

Then the transitions from step one are $(q_0,aa,q_0), (q_0,bb,q_0)$. Running the algorithm we get NDFA $M$ is

Now consider the two machines operating on input string $aabbc$. Below you see how the accepting computations in the two machines map back and forth to one another.

The point is that we can always do this: any sequence of configurations representing an accepting computation in $M'$ can be mapped to a sequence of configurations representing an accepting computation $M$ and vice versa. Thus the two machines accept exactly the same strings.

The Takeaway: All of these computational models are human inventions — we made them up, and we could just as easily have done things differently. So we should worry about whether those choices really matter? If we are looking at choices that don't change the computational power ... well then it doesn't matter. But how to show that one model is just as powerful as another model that made different choices? When we want to show that computational model $A$ is just as powerful as computational model $B$, we will often show that $A$ can simulate $B$. This means that within a concrete computation of the model $A$ simulator machine, we find the computation of the model $B$ machine it is simulating. So the idea of our NDFA'/NDFA equivalence proof is much bigger than just today's lesson. We'll use it a lot. And of course, what's the tool we use to reason about computations? Configurations!

Another example looking back at FAs: what if we'd done it differently?

Suppose we defined DFAs slightly differently. In particular, what if we added one cell of memory that transitions could read from and write to? Would that model (we'll call it DFA* ) be more powerful?

The defition of an DFA* would be $(Q,\Sigma,\Gamma,\delta,s,W)$, where \[ \delta: Q\times(\Gamma \cup \{\Box\})\times\Sigma \rightarrow Q\times(\Gamma \cup \{\Box\}) \]
The definition of configuration would be $(q,x,w)$ where $q \in Q$ is the state the machine is in, $x \in (\Gamma \cup \{\Box\})$ is what is in the memory cell, and $w$ is what remains of the input left to read.
Yields in one step (aka $\Rightarrow_M$) would be slightly to $(q,x,w) \Rightarrow_M (q',x',w')$ if $w = z w'$, where $z \in \Sigma$, and $\delta(q,x,z) = (q',x')$.
Yields in many steps (aka $\Rightarrow_M^*$) would stay the same, i.e. a sequence of $\Rightarrow_M$'s
The definition of accept would be: $M$ accepts $w$ if \[ (s,\Box,w) \Rightarrow_M^* (q,x,\lambda) \] where $q\in W$.

We'd like to prove that DFA*s are no more powerful than DFAs, i.e. a language is accepted by a DFA* if and only if it is accepted by a DFA. Now it's clear that any language that can be accepted by a DFA can be accepted by a DFA* (just don't use the memory). So how could we prove that any language that is accepted by a DFA* can be accepted by a DFA?

We need an algorithm that maps a DFA* to an equivalent DFA. In some sense, we will "simulate" the DFA* with a regular old DFA.

So how could we do this?