Dedicated to the memory of Ivan Niven.

**Abstract:**
Suppose two people are playing a guessing game with numbers.
One person thinks of a number between 1 and 1,000,000, and the other person
must determine the unknown number by asking only yes-or-no questions. Clearly,
this game will not prove too difficult for the Questioner. He only needs
to halve the set containing the unknown number with each question, e.g.,
Is the number between 1 and 500,000? Now let us suppose the second person
may lie to one of the questions asked him. The problem is to determine
a strategy which yields the least number of questions needed to determine
the unknown number.

Error-correcting codes - specifically Hamming Codes -
will be used to construct such a strategy. The strategy given does not
depend on feedback from the Responder and lends itself to quick decoding.
While the solution using codes for the most part agrees with answers previously
calculated, it sometimes requires one extra question. This discrepancy,
however, is outweighed by the simplicity and speed of the solution.

This paper constitutes part of my honors thesis written in spring 1999 at the USNA, advisor Assoc Prof. David Joyner.

Introduction

In his 1976 autobiography *Adventures of a Mathematician*, Stanislaw
Ulam [U] poses the following problem:

Someone thinks of a number between one and one million
(which is just less than 2^{20}). Another person is allowed to
ask up to twenty questions, to which the first person is supposed to answer
only yes or no. Obviously, the number can be guessed by asking first: Is
the number in the first half-million? and then again reduce the reservoir
of numbers in the next question by one-half, and so on. Finally, the number
is obtained in less than log_{2}(1000000). Now suppose one were
allowed to lie once or twice, then how many questions would one need to
get the right answer?

A problem of a similar sort can also be found in Elwyn Berlekamp's 1964 PhD thesis for MIT [B]. Since then, several papers have been published containing solutions to this problem and some of its variations. These solutions, however, were laden with complicated accounting and required feedback in between questions.

This paper approaches Ulam's problem from a different field of mathematics - Coding Theory. In general, we are concerned with transmitting information (which may or may not be erroneous) between two persons. If it is erroneous, we need an efficient method of detection and correction. The late Ivan Niven authored a paper in 1988 [N] that gave a solution to Ulam's Problem using error-correcting codes. While his idea was sound, the paper contained a few errors, and thus the problem remains unsolved. (In [N], the decoding algorithm at the bottom of page 277 and the corresponding argument on page 279 (below Equation 15) are both incorrect.)

Now we describe briefly the contents of this thesis. The solution contained
in this paper utilizes Hamming codes along with the necessary corrections
to Niven's idea to solve Ulam's problem with one lie. This work is not
found in the literature. The strategy is simple, requires no feedback from
the other player, and is easily generalized to the case where the number
may be chosen from the set 1 to **M**. Furthermore, some simple examples,
a discussion of problem variations, and a comparison to another ``feedback''
solution are included. In section 4, we define a new class of codes called
circle codes which have a very simple decoding algorithm, although they
are not as efficient as Hamming Codes. In section 8, we give a new result
concerning the case of the Pathological Liar (where the number of lies
is variable). The remaining sections contain background material. The paper
closes with two appendices, one a table of known A(n,d) values, the other
a table of Maple calculations comparing our formula for g( **M**,1)
with Pelc's formula for

f( **M**,1).

Definitions and Notation

For Ulam's game, let Player 1 choose the number from the set of integers
1 to **M**and Player 2 ask the questions to deduce the number. Player
1 and Player 2 are sometimes referred to as the Responder and the Questioner,
respectively. If Player 2 structures his questions based on previous answers,
we say he is using feedback. In this case, let f(**M**,e) denote the
minimum number of ``yes/no'' questions needed to deduce the unknown number
if Player 1 can lie at most e times. If feedback is not allowed, then the
minimum number of questions is g(**M**, e). This case is equivalent
to saying that all questions are determined before the game begins. Obviously,
f(**M**,e)<= g(**M**,e).

For a given integer n>0 , a **binary code** is a set **C**
of n-tuples of 0's and 1's. The set **C** can also be defined
as a subset of F_{2}^{n}, the n-dimensional vector space
over the integers mod 2. A **codeword** is an element of **C**.

The **weight** of a vector or n-tuple v , denoted by **wt**(v),
is the number of non-zero components in v. The **minimum distance**
of a code **C**, denoted by d, is the minimum weight of

v- v', where v, v' in **C**.

The **length** of a code is the number of binary bits in a
codeword. The **dimension** of a linear code is the dimension
as a vector space over F_{2} , which is equal to the number of
information carrying bits. A code **C** of length n and minimum
distance d is **perfect **if all the vectors in F_{2}^{n}
are contained in spheres of radius t = [ (d-1)/2] about the codewords
of **C**. A sphere of radius t about a codeword would contain
all vectors of equal length that differ from the codeword in at most
t places. In other

words, the spheres of radius t completely cover the space. Perfect
codes can detect and correct patterns of t or fewer errors.

The Hamming Code is a perfect, single error-correcting binary code with minimum weight 3. The Hamming Code has length n and dimension k, where

Let A( n, d) denote the size of the largest possible binary code of
minimum distance d and length n. If we fix **M** and e, the two parameters
of Ulam's problem, then g(**M**,e) is the smallest n such that {\it
A}(n,2e+1) >= **M**. It is a known fact that if a code **C**
is e-error correcting, then | **C**| <= A(n,2e+1). Finding
A(n,d) for every n and d is perhaps the central problem in Coding
Theory today, and is mostly unsolved (see table in Appendix A).

A Simple Problem

Let us first look at a simpler version of Ulam's game. As before, we
havetwo players, and Player 1 must choose a number between 0 and 15, inclusive.
Player 2 must discern which number Player 1 picked with a minimum number
of

questions, and Player 1 is allowed to lie at most once. To solve this
problem - and thus win the game - we will use a code generated by a three-circle
Venn Diagram. The diagram should be drawn with regions labeled as in Figure
1.

From Figure 1, we can see that this code has length 7. The spaces labeled 1 through 4 will contain the information bits. This coincides with the fact that it takes a binary 4-tuple to represent all numbers from 0 to 15. Spaces 5, 6, and 7 will contain the parity check bits. The code generated by this diagram has codewords of the form

Next, Player 1 has to convert his number to binary and encode it as a three-circle codeword. A table containing all 16 codewords is included at the end of this section for reference. Player 2 then asks seven questions of the form, ``Is the i-th bit of your 7-tuple a 1?'' Player 1's answers form the new 7-tuple,

If Player 1 was completely truthful, then the codeword's parity conditions hold. This means that the 7-tuple generated by Player 2's questions will match a 7-tuple from Table 1. At this point, Player 2 just has to decode his 7-tuple to win the game. A single lie, however, will yield a 7-tuple unlike any in Table 1. If this is the case, Player 2 must error-check the received codeword using the three parity check bits. In other words, Player 2 needs to find all existing parity failures in the received codeword. A parity failure occurs when a parity check bit does not agree with its corresponding sum. Once Player 2 determines the position of the erroneous bit, he corrects it by changing its value. Decoding the corrected codeword will yield Player 1's original number.

An example here will help to illustrate what is actually happening.
Suppose the answers to Player 2's questions produced the 7-tuple 0111101.
Since this codeword is not in Table 1, we may deduce that Player 1 lied
in answering one of the questions. Next we notice that the 5th and 7th
bits fail their respective parity checks. Since c_{3} is the only
bit common to the parity sums for c_{5} and c_{7}, it must
be the erroneous bit. This means that Player 1 lied to the third question,
and his actual number is the 7-tuple 0101101 which decodes to 5.

The algorithm for error-detection and correction is not difficult. The
erroneous bit is determined by which check bit - or combination of check
bits - fails parity. If a single check bit fails parity, then that check
bit is the erroneous bit. If two check bits fail parity, then the erroneous
bit is the one common to both parity sums. If all three check bits fail
parity, then the erroneous bit is b_{1} since it is the only one
common to all three parity sums. Table 2 shows the correspondence between
erroneous bits and sets of parity failures.

It is interesting to note that the code generated by three circles is equivalent to the (7,4,3)-Hamming Code. This means it is perfect, single-error correcting, and has a minimum weight of three.

decimal | binary | codeword |

0 | 0000 | 0000000 |

1 | 0001 | 0001110 |

2 | 0010 | 0010101 |

3 | 0011 | 0011011 |

4 | 0100 | 0100011 |

5 | 0101 | 0101101 |

6 | 0110 | 0110110 |

7 | 0111 | 0111000 |

8 | 1000 | 1000111 |

9 | 1001 | 1001001 |

10 | 1010 | 1010010 |

11 | 1011 | 1011100 |

12 | 1100 | 1100100 |

13 | 1101 | 1101010 |

14 | 1110 | 1110001 |

15 | 1111 | 1111111 |

parity failure region(s) | error position |

none | none |

A, B, and C | 1 |

B,C | 2 |

A,C | 3 |

A,B | 4 |

A | 5 |

B | 6 |

C | 7 |

Larger circle codes can be generated in the above manner by using Venn-like
diagrams with more circles. Suppose we have n circles of diameter 1.5,
centered at the n -th roots of unity on the unit circle in the complex
plane. Here is how to label the intersecting regions of this Venn-like
diagram. Beginning with 1 in the center and spiraling outward counterclockwise,
label the (n-1)^{2}+ n intersecting regions as in Figure 1. The
n outer regions will serve as check bits for our circle code defined as
follows. Write the integers 0,...,2^{(n-1)2}-1 in binary, put the
i -th bit in the i-th region, fill in the remaining n outer regions with
check digits. These ((n-1)^{2}+n)-tuples are the codewords. This
defines our **circle code**. In fact, a circle code generated by n circles
will have a length of (n-1)^{2}+n and dimension of (n-1)^{2}.
This can be easily shown.

**Theorem**: A circle code generated by n circles will have a length
of (n-1)^{2}+n and dimension of (n-1)^{2}.

Proof: By the nature of our construction, each circle has one section
that is not shared with the others, such as spaces 5, 6, and 7 in Figure
1. For an n-circle diagram, this amounts to n spaces. The inner regions
of an n-circle diagram are those that are shared by 2 or more circles.
There is only one region shared by all n circles. Due to the geometry of
these diagrams, there are always n regions shared among j circles for 2<=
j<= n-1. The dimension of an n-circle code is the number of inner regions
of the diagram since the inner regions correspond to the information bits
of the codeword. Thus, the dimension of an n-circle code is n(n-2)+1=(n-1)^{2}.
The length is then the sum of the dimension and the number of parity check
regions.

Each parity check bit of an n-circle code equals the mod 2 sum of the values of the inner regions of that specific circle. This is an extension of the parity sum construction of the previous section. The generalized circle code can correct one error since the number of combinations of parity failures is always greater than or equal to the number of bits in a codeword.

Because of its properties, we could consider using the circle code to solve Ulam's problem. However, a single, as yet unmentioned property will prevent us from doing this. This property is the rate of the circle code. The rate, R, of any code is defined as the ratio of information carrying bits to total bits of a codeword. Thus,

This comparison simply reiterates the fact that the Hamming Code is the best single-error correcting code. Accordingly, it will be used to solve Ulam's Problem.

Original Problem

We will now solve Ulam's original problem as stated in the beginning of this paper assuming one lie is allowed. We must first determine how many binary digits are needed to encode all numbers from 0 to 1,000,000. This is equal to the number of binary digits needed to represent 1,000,000. If we let b denote the number of binary digits needed, then

As a result of the definition, the first six bits of every codeword in our subcode are zeros. Only the last 25 bits of our subcode will carry information needed by Player 2, the first 20 being leftover information bits while the last 5 are parity check bits. In order to determine Player 1's number, Player 2 asks questions of the form, ``Is the {\it i}-th bit of the (31,26,3)-Hamming Code representation of your number a 1?'' for i =7...31. After 25 questions, Player 2 will have a binary 31-tuple whose first 6 bits are zero.

Now Player 2 must error-check his 31-tuple. He does this using the Hamming Check Matrix for the (31,26,3)-Hamming Code. The dot product (where addition is mod 2) of Player 2's possible codeword with each row of the 5x31 check matrix yields a binary 5-tuple. Converting this 5-tuple to decimal yields a number from 0 to 31, which corresponds to the index of the erroneous bit. A zero implies an accurate codeword meaning Player 1 never lied.

Finally, Player 2 corrects the erroneous bit - if necessary - by inserting the opposite binary value. For example, if the erroneous bit is a 1, Player 2 changes it to a 0. Converting the corrected codeword to decimal will yield Player 1's number.

The use of the Hamming Code gave us a solution that does not depend
on feedback from Player 1. All 25 questions were constructed before the
game actually began, and Player 1's number was determined using a quick
and efficient decoding algorithm. Thus, g(**M**,1)\leq 25 , when **M**=10^{6}
. It will follow from a result in section 7 that, in fact, g(**M**,1)=25.

Generalized Problem

Generalizing Ulam's original problem gives us the following question.

Question: Player 1 chooses a number between 0 and **M**-1. Player
2 must determine this number by asking yes or no questions. What is the
minimum number of questions needed to correctly determine Player 1's number
if he can lie at most e times (e >= 0)?

The solution from the previous section, applied to the set {0,1,2,...,**M**},
can be used to determine g(**M**,1) for all **M**.

As before, we must first calculate how many binary digits are needed
to represent every number in the set { 0,1,2,...,**M** } . Let b denote
the number of binary digits needed. Therefore,

Player 2 then asks b+r questions of the form, ``Is the i-th bit of the
( 2^{r}-1 , 2^{r}-r-1 ,3)-Hamming Code representation of
your number a 1?'', where {\it i}= 2^{r}-r-b , 2^{r}-r-b+1
,..., 2^{r}-1 . When finished, Player 2 will have a ( 2^{r}-1
)-tuple to error-check using the corresponding Hamming Check Matrix.

If the ( 2^{r}-1 )-tuple has an erroneous bit, Player 2 corrects
it by inserting the opposite binary value. Converting the corrected codeword
into decimal will yield Player 1's number. Again, no feedback was required,
all questions were pre-determined, and an efficient decoding algorithm
exists for the code used.

Comparison to Pelc's Solution

In 1987, Andrzej Pelc [P] solved Ulam's original problem with one lie
using a rigorous accounting process. The solution was complicated and required
feedback from the Responder on every question. Pelc went on to solve the

generalized one-lie problem giving the following conditions for the
minimum number of questions (Pelc's paper dealt with the set {1, 2,...,
**M**}).

**Theorem**:

If **M** is even, f(**M**,1)=min {n:**M**(n+1)<=2^{n}}
,

If **M** is odd, f(**M**,1)=min(n:**M**(n+1)+n-1<= 2^{n}}
.

Thus, f(10^{6},1)=25.

Where Pelc deals with the set {1,2,...,**M**}, we will use the set
{0,1,...,**M**-1}. In either case, Player 1 has an equal number of possibilities
to choose from. See Appendix B for the tabular comparison of g(**M**,1)
and f(**M**,1).

In his paper, Niven [N] states that g(**M**,1)=f(**M**,1) or f(**M**,1)+1
. Though earlier parts of his solution contained errors, this statement
and its argument are correct. Therefore we will consider the proof for
this statement already done. This is also shown experimentally in the comparison
contained in Appendix B.

Case of the Pathological Liar

An n x n Hadamard Matrix H is a matrix of \pm 1's satisfying

Conclusions

Approaching Ulam's problem from the field of Coding Theory yields a solution that is easy to understand and implement. The use of Hamming Codes gives a very efficient algorithm for constructing Player 2's questions, not to mention an efficient method of decoding. Also, we do not require feedback from Player 1 in order to advance the game. This is a definite improvement over Pelc' s solution. Although the Hamming Codes sometime require one more question than Pelc's solution, it is a small sacrifice for our solution's simplicity.

Although Ulam's problem has been solved for one lie, the solution is only a small piece of a larger problem for Coding Theorists. What happens when Player 1 can lie twice? This problem has been solved directly - again by Pelc - but with the same feedback and heavy accounting as before. Coding Theory has no answer. What we do know, thanks to Ray Hill \cite{H}, is that the codes necessary for the solution are not linear. The difficulty of this problem has increased drastically with the addition of only one lie. However, when the number of lies increases suitably along with the length of the code, thanks to Hadamard Codes, one can also solve Ulam's problem. The middle ground remains untouched. Coding Theory has much work ahead of it.

Appendix A

Minimum distance d of best binary (n,k) codes by A. Brouwer \cite{Br}, compiled by Luis Goddyn.

Derived from the tables of aeb@cwi.nl (A. Brouwer) found at ftp.win.tue.nl:/pub/math/codes/table2.gz Rows: n = 1..10 (Appendix BBrouwer's tables go to n=256) Columns: n-k = 0..min(n-1,31) Legend: n+ means n <= d >= n+1 n# means n <= d <= n+2 10 9 8 7 6 5 4 3 2 1 0 n-k/ ----------------------------------------- / n 1 1 2 1 2 3 2 1 3 4 2 2 1 4 5 3 2 2 1 5 6 4 3 2 2 1 6 7 4 4 3 2 2 1 7 8 5 4 4 2 2 2 1 8 9 6 4 4 3 2 2 2 1 9 10 6 5 4 4 3 2 2 2 1 10

Maple tables comparing Pelc's formula given above and our formula

M |
f(M,1) |
g(M,1) |

11 | 7 | 7 |

13 | 7 | 7 |

15 | 7 | 7 |

17 | 8 | 9 |

19 | 8 | 9 |

21 | 8 | 9 |

23 | 8 | 9 |

25 | 8 | 9 |

27 | 8 | 9 |

29 | 9 | 9 |

31 | 9 | 9 |

33 | 9 | 9 |

35 | 9 | 10 |

37 | 9 | 10 |

39 | 9 | 10 |

41 | 9 | 10 |

43 | 9 | 10 |

45 | 9 | 10 |

47 | 9 | 10 |

49 | 9 | 10 |

M |
f(M,1) |
g(M,1) |

10 | 7 | 7 |

12 | 7 | 7 |

14 | 7 | 7 |

16 | 7 | 7 |

18 | 8 | 9 |

20 | 8 | 9 |

22 | 8 | 9 |

24 | 8 | 9 |

26 | 8 | 9 |

28 | 8 | 9 |

30 | 9 | 9 |

32 | 9 | 9 |

34 | 9 | 9 |

36 | 9 | 10 |

38 | 9 | 10 |

40 | 9 | 10 |

42 | 9 | 10 |

44 | 9 | 10 |

46 | 9 | 10 |

48 | 9 | 10 |

50 | 9 | 10 |

[B] E. Berlekamp, ``Block coding with noiseless feedback,'' PhD Thesis, Dept EE, MIT, 1964

[Br] M. Grassl's best linear codes site, http://www.codetables.de

[H] R. Hill, ``Searching with lies,'' in **Surveys in Combinatorics**,
ed. by P. Rowlinson, London Math Soc, Lecture Notes Series # 218

[MS] F. MacWilliams and N. Sloane, **The theory of Error correcting
codes**, North-Holland Pub. Co., 1977

[N] I. Niven, ``Coding theory applied to a problem of Ulam,'' Math Mag 61(1988)275-281

[P] A. Pelc, ``Solution to Ulam's problem on searching with a lie,'' J Combin Theory A 44(1987)129-140

[U] S. Ulam, **Adventures of a mathematician**, Scribner and Sons,
New York, 1976

Typed into html 7-16-99 by wdj. Last updated 1-29-2008.