SI340: The halting problem, the limits of computation

Reading: Sections 5.3 and 5.4 of Theory of Computing, A Gentle Introduction.
Homework: Study for your Exam!

A Universal Turing Machine

As we hinted at last class, our goal is to create a "Universal" Turing machine; a machine that is whose input string would be an encoded Turing machine e(M) concatenated with an encoded input string e(w), and which would simulate the machine M computing on input string w. This would be just like a computer running a program, that's e(M)'s role, on data, that's e(w)'s role.

Before we ponder actually making this happen, we need to deal with one detail. Suppose this Universal Turing machine is given an input string v ∈ {0,1}* that is not a valid encoding of a Turing machine concatenated with a valid encoding of a string? Also, remember that for a special purpose machine M = (Q,Σ,Γ,δ,S) we assumed we'd only ever be given input over the alphabet Σ? What if v is a valid encoding of a string but the string contains characters not in the machine's input alphabet? What do we want to happen? We'll say that the proper thing for the Universal Turing machine to do in these cases is either crash or go into an infinite loop.

With that detail out of the way ... We're going to construct a 3-tape Turing machine that solves our problem. Ultimately, of course, we need a regular old 1-tape machine to act as our Universal Turing machine. But that's no problem, because we can automatically convert a multiple tape machine to a single tape machine. We went over how such a machine would operate in class, I'm not going to repeat it here unless I find a lot of time. We sort of did a demo, and that's hard to reproduce in text. What we hopefully concluded, is that while it'd be a big pain to construct a Universal TM, it can indeed be done.

A Language that cannot be semi-decided

Since we can encode machines, we can consider some fairly interesting languages. For example, what about the language:

L_M = { e(M) | M is a deterministic Turing machine }

i.e. the language of all strings of 0's and 1's that are valid encodings of Turing machines. This language can be decided by Turing machines (and therfore semi-decided as well), it turns out. Basically, we can figure out whether the string corresponds to the encoding rules using finite automata (remember the regular expression from last class?). This automaton can be simulated by a Turing machine. Then we need to go through and collect a list of all the states and symbols encoded in the string, and make sure that each state/symbol has exactly one transition. While tedious, this can be done with a Turing machine.

Now let's consider a slight variation of L_M. Consider the language

L = { e(M) | M is a deterministic Turing machine and M does not halt on input e(M) }

i.e. the language of all strings of 0's and 1's that are valid encodings of Turing machines such that the encoded Turing machine does not halt when given it's own encoded string as input. This is a wierd language, it's true. But the idea of giving a program to itself as input really isn't that odd. for example, the Unix program wc counts the number of bytes in a file. Since a Unix program is simply a file (a file with execute permissions), we might wonder how many bytes are in the executable file wc. So naturally we do that as follows:

valiant[1] [/usr/bin/]> wc wc
      24     272    7272 wc
valiant[2] [/usr/bin/]>

where wc tells us there are 7272 bytes in the file wc. Now, the question for us is this: Can L be semi-decided? Turns out that it cannot!

Theorem: No Turing machine semi-decides L.
Proof: Suppose there were a Turing machine semi-deciding L. Let T be such a machine. For any machine M such that e(M) ∈ L, we now have two conclusions: T halts on input e(M) (since T semi-decides L) and M does not halt on e(M) (by the definition of L). For any machine M such that e(M) ∉ L, we also have two conclusions: T does not halt on input e(M) (since T semi-decides L) and M does halt on e(M) (by the definition of L). To summarize:

For any machine M:
e(M) ∈ L ⇒ T halts on input e(M) and M does not halt on input e(M)
and
e(M) ∉ L ⇒ T does not halt on input e(M) and M halts on input e(M)

What about e(T)? Is it in L or not? I.e. let's see what happens when we consider the case M = T:

For any machine M:
e(M) ∈ L ⇒ T halts on input e(M) and M does not halt on input e(M)
and
e(M) ∉ L ⇒ T does not halt on input e(M) and M halts on input e(M)

Then for machine M = T we get:
e(T) ∈ L ⇒ T halts on input e(T) and T does not halt on input e(T)
and
e(T) ∉ L ⇒ T does not halt on input e(T) and T halts on input e(T)

But in both cases (e(T) ∈ L and e(T) ∉ L) we contradict ourselves by saying that T both halts and does not halt on input e(T). Since we arrive at a contradiction, our assumption that there was a Turing machine semi-deciding L must have been false!

Now we've proved that there is a problem that connot be solved by a Turing machine and thus, by the Church-Turing hypothesis, cannot be solved by any computational device!

A Language that cannot be decided: The halting problem

The language L from above is not a particularly natural language, and the fact that it can't be semi-decided lacks a bit of punch because it seems so obscure. In fact here's what we proved in layman's terms: There is no program that can read in a program and halt (as opposed to crashing or running forever) if and only if that program fails to halt when given itself as input. Most people at a cocktail party would have already wandered off or be completely confused by this. So we'd like something more "natural".

Consider the language

L_H = {e(M)e(w) | M halts on input w}

We showed in class that this is undecidable, since a TM deciding this language could be used to construct a TM semi-deciding L and we know no such machine exists. Deciding L_H is what's known as the halting problem, and we showed that it cannot be solved by a computer. This result has a nice characterization, though. It says that you cannot create a system that reads in a program and input for that program and determines whether or not the given program will halt on the given input. In particular, this means that an fully automated IC210 grading program is impossible!

Christopher W Brown

Last modified: Sun Dec 6 22:49:55 EST 2009