Reed-Solomon Codes
and CD Encoding

Stan Hanley

April 24^{th},
2002

SM486 4001

Error Correcting Codes

Professor David Joyner

In the digital age, all information whether it is sound, video, graphic, or text is 0?s and 1?s. Just as a picture is subject to a scratch or sound is subject to noise, digital information is susceptible to error. For data to be usable, there is a method of encoding information, which is organizing the 0?s and 1?s, such that errors can be corrected. Digital errors are most often in the form of a 0 being received as a 1 or vice versa. One major applications of digital encoding is the audio compact disk, or CD. CDs use a modified form of the Reed-Solomon code called the Cross Interleaved Reed-Solomon Code, or CIRC. Before this can be explained in more detail, general error correcting codes will be introduced along with the Reed-Solomon code.

Codes have
three primary characteristics. These
characteristics are the length, dimension, and minimum distance of a code. The length of a code, or *n*, is the
amount of bits (or information pieces) per codeword. The dimension, or *k*, is the amount of actual information
bits contained within each codeword.
Minimum distance, or *d*, is the minimum number of information
differences between each codeword. A
code is generally written as a [*n*,*k*,*d*] code. Coding algorithms take information and
translate it into a codeword. While
there is theoretically any number of coding algorithms, only a few have been
implemented. This is because information
must be able to be encoded and decoded relatively quickly. Also, some codes may be able to correct a
huge number of errors, but that means much of the transmitted information is
check bits. If check bits fill up a
bandwidth, then the system is slowed down.

Minimum distance is the most important factor for determining a codes ability to correct errors. The following example is the list of codewords for the [7,4,3] Hamming code:

[ 0 0 0 0 0 0 0 ], [ 0 1 0 1 0 1 0 ], [ 0 1 1 0 0 1 1 ], [ 0 1 1 1 1 0 0 ], [ 1 0 0 0 0 1 1 ], [ 1 0 0 1 1 0 0 ] [ 1 0 1 0 1 0 1 ],[ 0 0 0 1 1 1 1 ], [ 0 0 1 0 1 1 0 ]

[ 0 0 1 1 0 0 1 ], [ 0 1 0 0 1 0 1 ], [ 0 1 0 1 0 1 0 ], [ 0 1 1 0 0 1 1 ], [ 0 1 1 1 1 0 0 ], [ 1 0 0 0 0 1 1 ], [ 1 0 0 1 1 0 0 ], [ 1 0 1 0 1 0 1 ]

[ 1 0 1 1 0 1 0 ], [ 1 1 0 0 1 1 0 ], [ 1 1 0 1 0 0 1 ], [ 1 1 1 0 0 0 0 ], [ 1 1 1 1 1 1 1 ]

Looking at the codewords, one sees that each differs from
the others in at least three places.
Using *d*, one determines how many errors a code can correct. Let C be a Hamming Code with minimum
distance *d*. Then C is [(*d*-1)/2]
error correcting. For example, the
[7,4,3] Hamming code is 1 error correcting.
If one received [1 1 0 1 1 1 0], he would notice it is not a code word. However, it would quickly be corrected to [1
1 0 0 1 1 0]. If one received [1 1 1 1
1 1 0], he would not know if it was [1 0 1 1 0 1 0] with errors in the 2^{nd}
and 5^{th} bit, or [1 1 0 0 1 1 0] with errors in the 3^{rd}
and 4^{th} bit.

Reed-Solomon
codes are linear block codes. They are
often denoted RS(*n*,*k*) with *s*-bit symbols. The encoder takes *k* data symbols and
adds check symbols to make an *n* symbol codeword. Reed-Solomon codes correct up to *t*
errors in a codeword where 2*t*=*n*-*k*. For a symbol size *s*, the maximum
codeword length (*n*) is *n*=2* ^{s}*-1.
Because a Reed-Solomon codes correct
byte errors, they can potentially correct many bit errors. A symbol error occurs when 1 to all bits in
a symbol are wrong. For example,
RS(255,223) can correct 16 symbol errors. In the worst case, 16 bit errors may
occur, each in a separate symbol so that the decoder corrects 16 bit errors. In
the best case, 16 complete symbol errors occur so that the decoder corrects 16
x 8 bit errors. This makes a
Reed-Solomon code very good at correcting large clusters of errors.

Reed-Solomon
codes are based on finite field arithmetic.
Each codeword is generated using a generator polynomial. All valid codewords are exactly divisible by
the generator polynomial. The general
form of the generator polynomial is *g*(*x*)=(*x*-a^{i})(*x*-a^{i+1})?(*
x*-a^{i+2t}). The codeword is generated such that *c*(*x*)=*g*(*x*)*i*(*x*)
where *g*(*x*) is the generator polynomial, *i*(*x*) is the
information block, and *c*(*x*) is a valid codeword.

A received
codeword is a valid codeword, *c*(*x*), plus any errors,* e*(*x*). The Reed-Solomon decoder will identify the
position and magnitude of up to *t* errors and correct them. Any Reed-Solomon code has 2*t*
syndromes that depend only on errors.
If the roots of the generator polynomial are substituted into *r*(*x*)
where *r*(*x*)=*c*(*x*)+*e*(*x*), then the
syndrome is given.

The symbol
error location is found by solving a simultaneous equation with *t*
unknowns. Algorithms that do this take
advantage of the matrix structure of Reed-Solomon codes.
Two steps are required to find the error
location. First an error locator
polynomial is found. The two algorithms
that find this special polynomial are the Berlekamp-Massey algorithm and
Euclid's algorithm. By analyzing the
errors as the elements of a finite field, the Berlekamp-Massey algorithm finds
the shortest linear recurrence that will produce those elements. This algorithm usually leads to more
efficient software and hardware, but Euclid's algorithm is most often used
because it is easier to implement. The
roots of the error location polynomial are found using the Chien search
algorithm. Once the errors are located,
the proper symbol is found by solving the equation with *t* unknowns using
the Forney algorithm.

Now that Reed-Solomon codes have been introduced, we will discuss how they are used to decode errors on a CD. Audio CD's use the Red Book Standard, which was developed in 1980 by Sony and Philips to standardize how information on a disk would be stored. The standard was printed in a red binder, hence its name. The encoding of an audio CD is best understood by tracing audio information through the encoding process.

For sound to be "CD quality" it must be sampled at 44.1 kHz at 16 bits by two channels. This is a data rate of 1,411,200 bits/sec. 16 bits provides 65,536 (2CD players uses Cross-Interleaved Reed-Solomon Coding, or CIRC. This begins by taking the 24 8 bit words in encoding them in a RS(28,24) code. With 4 parity check symbols, it is 2 error correcting. The data is then interleaved. This process distributes the information from this frame over 109 frames. This allows errors on a large portion of the disk to be distributed over many small parts of the disk. This allows more errors to be corrected and prevents ruining an entire block of information. This is demonstrated in the following simplistic example:

Three length three code words must be transmitted, [010][001][011]. Allow the code to be 1 error correcting. If the information is not decoded successfully, something bad will happen. The data is interleaved by this pattern:

After interleaving, the data is encoded in a RS(32,28) code. This is also a 2 error correcting code. The data is once again interleaved with a new pattern. Interleaving works well for a CD because 1 frame is spread over 109 frames. Each frame is 32 bytes long. Each frame corrects up to 4 symbol errors. 109 frames can correct 436 errors. With interleaving, 13.625 frames can be corrected, where it would be impossible to do this without the interleaving.

The first code, RS(28,24) is called the C2 level of encoding. It corrects errors due to the physical condition of the CD and the way that the CD was recorded. The RS(32,28) code is the C1 level and it corrects errors due to fingerprints and scratches. This completes the encoding of the audio information, but control information must be added to the CD.

This is done by the addition of an 8 bit subcode to each frame. The subcode bits are designated P,Q,R,S,T,U,V,W. Only the P and Q bits are used on audio CDs. Bits R through W are used for CD-Graphics like karaoke CDs. The P bit is used to designate the start of a track. The Q bit provides timing information. The subcode bits from 98 frames are collected to form 8 98 bit words. Because the subcode is located with the music, the exact location and time on the disk is always known.

With the addition of a subcode block, the information is 33 bytes long. However, before it is encoded to the CD it must be modified for use on a CD. The information undergoes eight to fourteen modulation, or EFM. With this, each 8 bit word is assigned a 14 bit word from a ROM (read-only memory) dictionary. These words are designed to have a low number of transitions from 0's to 1's. Also, this increases the minimum distance of the code therefore adding an extra layer of error protection. Each 14 bit word is joined by 3 merging bits to aid in timing synchronization. Now the frame is 561 bits long. Each frame is assigned a unique 24 bit synchronization pattern to proceed it along with three merging bits. This makes one frame 588 bits long.

A block is made up of 98 frames. The rotational velocity of the CD changes so that the linear velocity over the laser is a consistent 1.3 m/s. At this rate, 75 blocks are read per second. 1 block with 98 frames with 24 sound data bytes at 8 bits each by 75 blocks per second is a data rate of 1,411,200 bits/second. This reproduces the original sound information rate. However, the actual data rate is 98 frames/block times 75 blocks/second by 488 bits/frame is 4,321,800 bits/second. CD-ROM format is slightly different in the amount of error correction it provides. The standard data rate for a CD-ROM is 150 kB/sec, or 1,233,600 bits/second.

The CIRC code used on audio CDs can correct burst errors of up to 3500 bits (2.4mm) and can interpolate error bursts of up to 12,000 bits (8.5 mm). Advances in technology in the past 20 years have lead to even more applications for CD technology including DVDs. The error correction on a CD guarantees that high quality music can be enjoyed consistently and reliably.

Coding and Modulation Schemes for Digital Audio www.stanford.edu/courses/192b/lectures/5/5.html

Our Guide CD Recording Methods & Error Correction www.a1-electronics.co.uk/PcHardware/CD-RW/RecordMethods.shtml

Frequently Asked Questions About Compact Discs www.mscience.com/faq28.html

Compact Discs

www.cs.tut.fi/~ypsilon/80545/CD.htmlReed-Solomon Codes http://www.4i2i.com/reed_solomon_codes.htm

Algebraic Decoding http://math.berkeley.edu/~berlek/alg.html

Subcode On the Compact Disk http://www.ee.washington.edu/conselec/CE/reports/Group.1/matt_page_individual/subcode.html

CD-ROM Technical Summary From Plastic Pits to Fantasia pauillac.inria.fr/~lang/hotlist/cdrom/Documents/tech-summary.html

CD Data Coding www.disctronics.co.uk/technology/cdbasics/cd_frames.htm

CD Physical Specification www.disctronics.co.uk/technology/cdbasics/cd_specs.htm

Compact Disc Creation Services , CD-ROM Information