Posted by permission
A Survey of the Theory of Error-Correcting Codes
Let us begin with a quick example. Since computers are great users of binary information (1's and 0's), we shall send our messages in binary. Suppose we want to send a message which is either 1 (meaning "yes," for example) or 0 (meaning "no"), across a "channel" (e.g., a telephone line, a radio transmitter, or a CD laser). The problem is that the channel may be noisy (because of lightning, scratches, etc.); in other words, there is a certain probability that a 1 will be flipped to a 0 and vice versa. What should we do?
The first idea is just to send our message twice; if we want the other end to know that we meant 1, we send the "codeword" 11. If there is some noise, and the other end receives 01 instead, then the other end knows that an error occurred, and (if the circumstances permit) can ask for a retransmission. So this "repetition code," which takes 1 and encodes it as 11, and 0 as 00, allows us to detect one error. For some applications this is good enough.
However, if we get 01, we know that one error occurred, but was the first entry garbled from a 1, or the second entry garbled from a 0? There is no way of telling, since 01 is "as close to" 00 as it is to 11 (in the sense that it differs in the same number of places from each codeword).
Now instead of sending just two 1's, let us send three! Then the two codewords are 111 and 000. This time, if just one error occurs and we receive 101, we can still decode the received string (or "vector") by a "majority vote" and recover the correct 1. Therefore this code can not only detect, but also correct, one error, because 101 is closer to 111 than to any other codeword (in this case, 000 is the only other candidate). This notion of distance is the key to error correction; in constructing a code, we want to add our redundancy in such a way as to make codewords as far apart from each other as possible, without lengthening the message more than necessary.
Error-correcting codes were first discovered abstractly in 1945 when Claude Shannon proved a theorem (described later) which states that, even with a noisy channel, there exist ways to encode messages in such a way that they have an arbitrarily good chance of being transmitted safely, provided that one does not exceed the "capacity" of the channel by trying to transmit too much information too quickly.
The first constructions of error-correcting codes, however, come from rather curious sources. In 1947, at Bell Labs, a mathematician named Richard Hamming became fed up with a computer which could detect errors in his input during weekend runs, but would then just dump the program, wasting the entire run. He devised ways to encode the input so that the computer could correct isolated errors and continue running, and his inquiries led him to discover what are now called Hamming codes.
Soon after, Marcel Golay generalized Hamming's construction, and constructed codes that corrected one error per word which used p symbols (rather than just 0 and 1) for p prime. He also constructed two very remarkable codes that correct multiple errors, and that now bear his name.
However, perhaps most curiously, one of the very same Golay codes appeared a few years earlier, in a Finnish magazine dedicated to betting on soccer games! We shall elaborate on this connection later. For more on this interesting history (and on the material to follow), see the introductory book by Thompson [7], and also Conway-Sloane [2], van Lint [4], Pless [5], and the references in these sources.
For notation, the number of places in which two vectors differ is usually referred to as the
Hamming distance between the two vectors. The weight of a vector is the number
of non0 entries in it (that is, its Hamming distance from the all 0's vector). Using mod 2
arithmetic (that is, 1+1=0) to add two vectors coordinate by coordinate (no carrying), we
see that the Hamming distance between two vectors is equal to the weight of their difference.
For example, 100110101 has weight 5, and
So if a codeword gets garbled, then the number of errors is exactly the Hamming distance of the received vector from the original codeword. In order that we detect the errors, the received vector must not land on another codeword. This is why a large minimum distance is so important; if there are fewer than d errors then we can at least detect the error. Also, if there are there are strictly fewer than d/2 errors, then the received vector is still closer to the original codeword than to any other.
Now we can formally define "error-correcting;" a code is said to correct e errors if, whenever at most e errors occur when transmitting one codeword, this decoding process w ill always yield the correct message. So by what we said in the previous section,
error-correcting.This method of comparing our received vector to every codeword and selecting the closest one is theoretically the most reliable way of decoding. It is of course not very efficient, especially for big codes; a large part of coding theory, not explored any further here, deals with finding codes that can be decoded efficiently and implementing decoding schemes for them.

In this manner, we impose parity on the codeword. If any single error occurs (say 1001 becomes 1101), it will change the number of 1's by one and thus throw off the parity. So when we perform a "parity check" on the received vector by adding all the coordinates, we will detect the error. But we cannot tell which digit is wrong, since all single errors have the same effect.
Notice that, given a message of m symbols, this process creates a (m+1,2^m,2) code. This idea of making "parity checks" is the precursor to more sophisticated codes.
A generalization of this process yields the Hamming codes. Hamming studied the problem of trying to find a way to correct a single error efficiently, using as few extra bits as possible. Hamming's strategy was to impose parity on parts of the codeword interleaved in such a way as to cover the maximum amount of ground.
He looked at this situation. Suppose you have a message of m binary symbols, and you want to construct a code of length n by adding k=n-m "parity bits." Every parity bit will make a certain part of the codeword, plus itself, sum to 0; checking that these positions do in fact sum to 0 will be called a "parity check." How many parity bits do we need to ensure that our code corrects single errors?
To answer this, suppose we transmit our codeword. Then when we receive a length n vector, we perform k parity checks. If we received our codeword error-free, then by design all the parity checks sum to 0. For our code to detect an error, some parity check must fail whenever an error occurs. For our code to correct an error, each single error, which could occur in any of the n positions, must disturb the set of values of the parity checks in a different way, and they must all be distinct from the result of checking the correct codeword.
All in all, the total number of distinct sets of parity check values, 2^k, must be at least n+1. So Hamming searched for codes of length 2^k-1, which would require k check digits.
We demonstrate with an example of encoding, which takes any 4-bit message and encodes it into a 7-digit codeword.
Suppose we want to encode the word 1100. Then we construct the codeword (with symbols called X1, ..., X7) by putting the four "message bits" into the 3rd, 5th, 6th, and 7th slots, like so:

Now to fill in the rest of the positions with "parity checks," we fill in the 1st slot with the digit that will make the sum of the 1st, 3rd, 5th, and 7th entries equal to 0. This implies

The 2nd slot ensures that the sum of the 2nd, 3rd, 6th, and 7th entries is 0, that is,

Finally the 4th slot is chosen to make the 4th, 5th, 6th, and 7th entries add up to 0, so

Thus our codeword for 1100 is 0111100.
Hamming's ingenious selection of which coordinates to add together comes from the fact that binary expansions of 1, 3, 5, and 7 (i.e., 001, 011, 101, and 111) all have a 1 in the rightmost position. Similarly, 2, 3, 6, and 7 all have a 1 in the next-to-last position in their binary expansions. Using this list, we put our message bits into those positions which are not powers of 2, and then fill in the other positions by checking parity. The reason for this ordering comes from the fact that if a single error occurs, say in the 6th position, then that one error will upset just those parity checks that contain X6 in them, so

and the values of the parity checks give the binary expansion of 6 (110), the error position! Theoretically, of course, we could arrange that the first four positions contained the message, and the last three were the check digits (or any other ordering); this is one example of how one ordering can possess certain practical advantages over another.
We now derive some basic facts about this code. The procedure above generates a length 7 codeword for each length 4 message, so there are 2^4 = 16 codewords out of 2^7 = 128 vectors of length 7. Also, we find that the minimum distance between any two codewords is 3. If you change exactly one message bit, that affects at least two parity checks (because message bits are not located at powers of 2); if you change two, then each one affects at least two parity checks, and not exactly the same ones because the position numbers have different binary expansions. If you change more than two message bits, the codewords are already at least at distance 3!
We have constructed the (7,2^4,3) Hamming code. In general, a binary Hamming code of
length 2^k-1 will have k check digits and thus 2^k-1-k
message positions left over, so it will be a
code.
We now prove that these codes (and all of the codes above) have the property that the sum of any two codewords is another codeword, that is, they are linear. This extra structure allows us to use the tools of linear algebra to study such codes.
First note that repetition codes are linear, since the sum of any two messages is another valid message when we add codewords coordinatewise.
One can also see this easily for the parity codes, because the equations "even + even = even" and "even + odd = odd" still hold for the weights of the codewords. For instance,
The only time you lose any 1's is when two 1's in the same position cancel, and this does not affect parity.
A little tweaking of this argument shows that the Hamming codes (which just come from overlapping parity checks, after all) are also linear.
As we said, this linearity condition is quite useful. For instance, one can show that in a linear (n1,M,d) binary code, M is always of the form 2^m, and we can arrange that m of the positions are message bits. In fact, we can even arrange that the code is obtained by specifying some sequence of parity checks on the messages, just as in the Hamming codes. We will not dwell on these or the many other advantages of using linear algebra to study codes, except to say that many properties of codes become quite transparent by the application of this basic tool of mathematics.

This rough statement can be extended to nonlinear codes where the number of codewords M is not always of the form 2^m by defining the rate as

There is a fundamental theorem of Shannon which guarantees the existence of codes that handle a given noisy channel. Given a channel with probability p of sending each bit correctly, we can concoct a quantity called the capacity C(p) of the channel, which is roughly the maximum amount of information that can be sent across it in a unit amount of time. In fact,

Then Shannon's theorem states that for any rate R strictly less than the capacity C(p) and any e>0, there exists some code with a sufficiently large length and with rate at least R, such that the probability that an error will occur in the message is less than e. This means that we can achieve, with long enough codes, communication that is as safe as we like. These codes do not have to be linear, and the proof does not construct them; all we know is that they exist. One challenge of coding theory is to construct such codes.
vectors in it, because there are
vectors at distance i from a given vector. Certainly if r is less than or
equal to the number of errors e the code corrects, then any two balls are disjoint.
Now, we ask, what codes have the property that the balls of radius e not only are disjoint, but pack the space completely? This question has been satisfactorily answered. All possible parameters (n, M, d) of perfect codes are known. They correspond to the linear perfect codes; that is, certain trivial cases, the Hamming codes (and analogues over other finite fields), and two remarkable Golay codes. The only other perfect codes are certan nonlinear codes with the same parameters as the Hamming codes.
First, notice that every repetition code of odd length is perfect, because any vector either has more 1's than 0's (in which case it is strictly closer to the all 1's vector), or vice versa---no ties! At the other extreme, the code consisting of all possible vectors of a given length is also perfect (with e=0). Finally the code with only one codeword is of course perfect.
One can show that the Hamming codes we constructed above are also perfect.
In fact, in a Hamming code of length 2^k-1, a ball of radius 1 around a
codeword contains 1+(2^k-1) vectors, 1 for the codeword itself and one for
each position. But there
are
codewords, so these balls contain

vectors; thus the balls of radius 1 exactly exhaust all vectors of length 2^k-1 and the code is perfect.
In addition, the special binary code on 23 symbols, with parameters (23, 2^12, 7) discovered by Golay, is perfect (and it corrects 3 errors!). These, coupled with certain nonlinear codes that have the same parameters as the Hamming codes, are all of the binary perfect codes. If we allow more symbols in our alphabet than just 0 and 1, then we get analogues of the Hamming codes, and another Golay code of length 11, this time on three letters (say 0, +, and -) and with parameters (11, 3^6, 5). This completes the list of all linear perfect codes.
It was in this vein that the ternary Golay code was first constructed; its discover, Juhani Virtakallio, exhibited it merely as a good betting system for football-pools, and its 729 codewords appeared in the football-pool magazine Veikkaaja. For more on this, see Barg's article [1].
The real-world applications go far beyond the CD player to all of computing and transmission technology, including hard disk drives, satellite communications, and digital television. There is much more to be said about the powerful and important theory of error-correcting codes.
[2] Conway, J. H., and N. J. A. S. Sloane. Sphere Packings, Lattices, and Groups, 2nd ed., Springer-Verlag, 1993.
[3] Hoeve, H., J. Timmerman, and L. B. Vries. "Error Correction and Concealment in the Compact Disc System," Philips Technical Review, Vol. 40 (1980), No. 6, pp. 166-172.
[4] van Lint, J. H. Introduction to Coding Theory, 2nd ed., Springer-Verlag, 1992.
[5] Pless, Vera. Introduction to Error-Correcting Codes, 2nd ed., Wiley, 1989.
[6] Sloane, N. J. A. S. A Short Course on Error-Correcting Codes, Springer-Verlag, 1975.
[7] Thompson, Thomas M. From Error-Correcting Codes Through Sphere Packings To Simple Groups, Mathematical Association of America, 1983.
Appeared in volume 1 in 1994 Tangents: Harvard-Radcliff Math Bulletin.