Let's take "simplest" to mean fewest states. What would it take to prove that this is indeed the simplest machine solving the problem? We need to show that any machine with fewer states fails! This is not so dissimilar to the task that faced us with our proofs that various languages are not regular. We need an algorithm that takes a Turing machine with fewer states, and produces a counter example, i.e. an input for which that machine fails to perform as required.
Note that our formal definition of a TM requires that every machine have at least two states: the halt state, and a distinct start state. Since Mr. Afable's machine has three states, the only machines with fewer states are those with two states: the halt state "h" and another state, which we'll call q. So here's my algorithm:
Input: TM $M = (Q,\Sigma,\Gamma,\delta,s)$, where $|Q| = 1$
Output: string $w \in \Sigma$ such that $M$ does not meet the above specification on input $w$
if $M$ fails to meet the above specification on input $aa$
then $w = aa$
else $w = aaa$
Of course now I need to prove that this algorithm works, i.e. that it always produces our counter example. Clearly, if the machine fails on input $aa$, we output $w = aa$ and we've found our counter example. What about when the machine operates successfully on $aa$? This means: \[ (q,\lambda,a,a) \Rightarrow_M^* (h,\lambda,b,b) \] Now, notice that at some point we must move the tape head from the first cell to the second cell (in order to change it to b), and if we look at the configuration the very first time the r/w head is pointing to the second cell, it must look like $(q,x,a,\lambda)$, where $x$ is element of $\Gamma$ or □. Since this is the first time the r/w head is pointing to that cell, we know that it contains a, and we know there is nothing after it on the tape. Thus, we get: \[ (q,\lambda,a,a) \Rightarrow_M^* (q,x,a,\lambda) \Rightarrow_M^* (h,\lambda,b,b) \] Crucially, we get that $(q,x,a,\lambda) \Rightarrow_M^* (h,\lambda,b,b)$ and that during this process the machine never moves left from the square with x, since that would cause a crash. So if we have input $aaa$, then we must have \[ (q,\lambda,a,aa) \Rightarrow_M^* (q,x,a,a) \] because the $M$ can't operate any differently on input $aaa$ than on $aa$ until it moves the tape head to the third cell. However, since $(q,\lambda,a,a) \Rightarrow_M^* (h,\lambda,b,b)$ we must have $(q,x,a,a) \Rightarrow_M^* (h,x,b,b)$. Put this all together, and we get \[ (q,\lambda,a,aa) \Rightarrow_M^* (q,x,a,a) \Rightarrow_M^* (h,x,b,b) \] where $x \neq \lambda$, which means that $M$ halts with the r/w head on the wrong square (not the first), and thus hasn't operated as required.
Here's what you should take away from this: We really require the notion of configuration in order to reason about the computation of Turing machines.
We could construct the new machine (we'll call it an NDFA' )
Would that model be more powerful? Well, first we would have to redefine all of our usual NDFA notions for the new NDFA' model:
Clearly any language that is accepted by an NDFA is also accepted by an NDFA', since any NDFA already fits the definition of an NDFA'. But what about the other way around? How could we prove that any language accepted by an NDFA' prime is also accepted by a regular old NDFA? This would show that the new NDFA' is no more powerful than the original NDFA.
We need an algorithm that maps a NDFA$'$ to an equivalent NDFA. In some sense, we will "simulate" the NDFA$'$ with a regular old NDFA!
Algorithm I:
Input: NDFA$'$ $M' = (Q',\Sigma,\Delta',s,W)$
Output: NDFA $M$ such that $L(M') = L(M)$
- let $(p_{0},u_{0},q_{0}),\ldots (p_{r},u_{r},q_{r})$ be the transitions in $\Delta'$ for which the length of the string label on the transition is greater than one.
- $M = (Q,\Sigma,\Delta,s,W)$, where
- $Q = Q' \cup \{q_{i,j} \mid q_i \in Q' \text{ and } 1 \leq j \leq |u_i| - 1\}$
- $\Delta = \Delta' - \{(p_{0},u_{0},q_{0}),\ldots (p_{r},u_{r},q_{r})\}$
- for each transition $(p_i,u_i,q_i)$ from the above list, let $x_1 x_2 \cdots x_k$ be $u_i$ written out character by character \[\Delta = \Delta \cup \{(p_i,x_1,q_{i,1}), (q_{i,1},x_2,q_{i,2}), \ldots (q_{i,k-1},x_{k-1},q_{i,k-1}), (q_{i,k-1},x_k,q_i)\}\]
Of course now we have to prove that this algorithm works! The idea behind this proof (and indeed behind any proof that one kind of machine "simulates" another) is that accepting computations for $M'$ and accepting computations for $M$ basically "match up", so that one gives the other. We'll look at an example of how this works rather than going through the proof. Suppose the input machine NDFA$'$ $M'$ is
Then the transitions from step one are $(q_0,aa,q_0), (q_0,bb,q_0)$. Running the algorithm we get NDFA $M$ is
Now consider the two machines operating on input string $aabbc$. Below you see how the accepting computations in the two machines map back and forth to one another.

The Takeaway: All of these computational models are human inventions — we made them up, and we could just as easily have done things differently. So we should worry about whether those choices really matter? If we are looking at choices that don't change the computational power ... well then it doesn't matter. But how to show that one model is just as powerful as another model that made different choices? When we want to show that computational model $A$ is just as powerful as computational model $B$, we will often show that $A$ can simulate $B$. This means that within a concrete computation of the model $A$ simulator machine, we find the computation of the model $B$ machine it is simulating. So the idea of our NDFA'/NDFA equivalence proof is much bigger than just today's lesson. We'll use it a lot. And of course, what's the tool we use to reason about computations? Configurations!
We'd like to prove that DFA*s are no more powerful than DFAs, i.e. a language is accepted by a DFA* if and only if it is accepted by a DFA. Now it's clear that any language that can be accepted by a DFA can be accepted by a DFA* (just don't use the memory). So how could we prove that any language that is accepted by a DFA* can be accepted by a DFA?
We need an algorithm that maps a DFA* to an equivalent DFA. In some sense, we will "simulate" the DFA* with a regular old DFA.So how could we do this?