Reed-Solomon Codes and CD Encoding

Stan Hanley

April 24th, 2002

SM486 4001

Error Correcting Codes

Professor David Joyner

 


In the digital age, all information whether it is sound, video, graphic, or text is 0?s and 1?s. Just as a picture is subject to a scratch or sound is subject to noise, digital information is susceptible to error. For data to be usable, there is a method of encoding information, which is organizing the 0?s and 1?s, such that errors can be corrected. Digital errors are most often in the form of a 0 being received as a 1 or vice versa. One major applications of digital encoding is the audio compact disk, or CD. CDs use a modified form of the Reed-Solomon code called the Cross Interleaved Reed-Solomon Code, or CIRC. Before this can be explained in more detail, general error correcting codes will be introduced along with the Reed-Solomon code.

Codes have three primary characteristics. These characteristics are the length, dimension, and minimum distance of a code. The length of a code, or n, is the amount of bits (or information pieces) per codeword. The dimension, or k, is the amount of actual information bits contained within each codeword. Minimum distance, or d, is the minimum number of information differences between each codeword. A code is generally written as a [n,k,d] code. Coding algorithms take information and translate it into a codeword. While there is theoretically any number of coding algorithms, only a few have been implemented. This is because information must be able to be encoded and decoded relatively quickly. Also, some codes may be able to correct a huge number of errors, but that means much of the transmitted information is check bits. If check bits fill up a bandwidth, then the system is slowed down.

Minimum distance is the most important factor for determining a codes ability to correct errors. The following example is the list of codewords for the [7,4,3] Hamming code:

[ 0 0 0 0 0 0 0 ], [ 0 1 0 1 0 1 0 ], [ 0 1 1 0 0 1 1 ], [ 0 1 1 1 1 0 0 ], [ 1 0 0 0 0 1 1 ], [ 1 0 0 1 1 0 0 ] [ 1 0 1 0 1 0 1 ],[ 0 0 0 1 1 1 1 ], [ 0 0 1 0 1 1 0 ]

[ 0 0 1 1 0 0 1 ], [ 0 1 0 0 1 0 1 ], [ 0 1 0 1 0 1 0 ], [ 0 1 1 0 0 1 1 ], [ 0 1 1 1 1 0 0 ], [ 1 0 0 0 0 1 1 ], [ 1 0 0 1 1 0 0 ], [ 1 0 1 0 1 0 1 ]

[ 1 0 1 1 0 1 0 ], [ 1 1 0 0 1 1 0 ], [ 1 1 0 1 0 0 1 ], [ 1 1 1 0 0 0 0 ], [ 1 1 1 1 1 1 1 ]

Looking at the codewords, one sees that each differs from the others in at least three places. Using d, one determines how many errors a code can correct. Let C be a Hamming Code with minimum distance d. Then C is [(d-1)/2] error correcting. For example, the [7,4,3] Hamming code is 1 error correcting. If one received [1 1 0 1 1 1 0], he would notice it is not a code word. However, it would quickly be corrected to [1 1 0 0 1 1 0]. If one received [1 1 1 1 1 1 0], he would not know if it was [1 0 1 1 0 1 0] with errors in the 2nd and 5th bit, or [1 1 0 0 1 1 0] with errors in the 3rd and 4th bit.

Reed-Solomon codes are linear block codes. They are often denoted RS(n,k) with s-bit symbols. The encoder takes k data symbols and adds check symbols to make an n symbol codeword. Reed-Solomon codes correct up to t errors in a codeword where 2t=n-k. For a symbol size s, the maximum codeword length (n) is n=2s-1. Because a Reed-Solomon codes correct byte errors, they can potentially correct many bit errors. A symbol error occurs when 1 to all bits in a symbol are wrong. For example, RS(255,223) can correct 16 symbol errors. In the worst case, 16 bit errors may occur, each in a separate symbol so that the decoder corrects 16 bit errors. In the best case, 16 complete symbol errors occur so that the decoder corrects 16 x 8 bit errors. This makes a Reed-Solomon code very good at correcting large clusters of errors.

Reed-Solomon codes are based on finite field arithmetic. Each codeword is generated using a generator polynomial. All valid codewords are exactly divisible by the generator polynomial. The general form of the generator polynomial is g(x)=(x-ai)(x-ai+1)?( x-ai+2t). The codeword is generated such that c(x)=g(x)i(x) where g(x) is the generator polynomial, i(x) is the information block, and c(x) is a valid codeword.

A received codeword is a valid codeword, c(x), plus any errors, e(x). The Reed-Solomon decoder will identify the position and magnitude of up to t errors and correct them. Any Reed-Solomon code has 2t syndromes that depend only on errors. If the roots of the generator polynomial are substituted into r(x) where r(x)=c(x)+e(x), then the syndrome is given.

The symbol error location is found by solving a simultaneous equation with t unknowns. Algorithms that do this take advantage of the matrix structure of Reed-Solomon codes. Two steps are required to find the error location. First an error locator polynomial is found. The two algorithms that find this special polynomial are the Berlekamp-Massey algorithm and Euclid's algorithm. By analyzing the errors as the elements of a finite field, the Berlekamp-Massey algorithm finds the shortest linear recurrence that will produce those elements. This algorithm usually leads to more efficient software and hardware, but Euclid's algorithm is most often used because it is easier to implement. The roots of the error location polynomial are found using the Chien search algorithm. Once the errors are located, the proper symbol is found by solving the equation with t unknowns using the Forney algorithm.

Now that Reed-Solomon codes have been introduced, we will discuss how they are used to decode errors on a CD. Audio CD's use the Red Book Standard, which was developed in 1980 by Sony and Philips to standardize how information on a disk would be stored. The standard was printed in a red binder, hence its name. The encoding of an audio CD is best understood by tracing audio information through the encoding process.

For sound to be "CD quality" it must be sampled at 44.1 kHz at 16 bits by two channels. This is a data rate of 1,411,200 bits/sec. 16 bits provides 65,536 (216) levels of sound information every 2.268x10-5 seconds. The audio data is broken into frames of 6 16bit words/channel. This data is then converted to 24 8 bit (1 byte) words. Each frame contains information for 1.36 ten thousandths of a second.

interleaving pattern

CD players uses Cross-Interleaved Reed-Solomon Coding, or CIRC. This begins by taking the 24 8 bit words in encoding them in a RS(28,24) code. With 4 parity check symbols, it is 2 error correcting. The data is then interleaved. This process distributes the information from this frame over 109 frames. This allows errors on a large portion of the disk to be distributed over many small parts of the disk. This allows more errors to be corrected and prevents ruining an entire block of information. This is demonstrated in the following simplistic example:

Three length three code words must be transmitted, [010][001][011]. Allow the code to be 1 error correcting. If the information is not decoded successfully, something bad will happen. The data is interleaved by this pattern:

interleaving pattern
If the first word is incorrectly received as [110], when the information is de-interleaved, the received words would be [110][101][111]. Through error correction, the words would be decoded as [010][001][011]. If the data were not interleaved, it would have been corrupted and unreadable.

After interleaving, the data is encoded in a RS(32,28) code. This is also a 2 error correcting code. The data is once again interleaved with a new pattern. Interleaving works well for a CD because 1 frame is spread over 109 frames. Each frame is 32 bytes long. Each frame corrects up to 4 symbol errors. 109 frames can correct 436 errors. With interleaving, 13.625 frames can be corrected, where it would be impossible to do this without the interleaving.

The first code, RS(28,24) is called the C2 level of encoding. It corrects errors due to the physical condition of the CD and the way that the CD was recorded. The RS(32,28) code is the C1 level and it corrects errors due to fingerprints and scratches. This completes the encoding of the audio information, but control information must be added to the CD.

This is done by the addition of an 8 bit subcode to each frame. The subcode bits are designated P,Q,R,S,T,U,V,W. Only the P and Q bits are used on audio CDs. Bits R through W are used for CD-Graphics like karaoke CDs. The P bit is used to designate the start of a track. The Q bit provides timing information. The subcode bits from 98 frames are collected to form 8 98 bit words. Because the subcode is located with the music, the exact location and time on the disk is always known.

With the addition of a subcode block, the information is 33 bytes long. However, before it is encoded to the CD it must be modified for use on a CD. The information undergoes eight to fourteen modulation, or EFM. With this, each 8 bit word is assigned a 14 bit word from a ROM (read-only memory) dictionary. These words are designed to have a low number of transitions from 0's to 1's. Also, this increases the minimum distance of the code therefore adding an extra layer of error protection. Each 14 bit word is joined by 3 merging bits to aid in timing synchronization. Now the frame is 561 bits long. Each frame is assigned a unique 24 bit synchronization pattern to proceed it along with three merging bits. This makes one frame 588 bits long.

A block is made up of 98 frames. The rotational velocity of the CD changes so that the linear velocity over the laser is a consistent 1.3 m/s. At this rate, 75 blocks are read per second. 1 block with 98 frames with 24 sound data bytes at 8 bits each by 75 blocks per second is a data rate of 1,411,200 bits/second. This reproduces the original sound information rate. However, the actual data rate is 98 frames/block times 75 blocks/second by 488 bits/frame is 4,321,800 bits/second. CD-ROM format is slightly different in the amount of error correction it provides. The standard data rate for a CD-ROM is 150 kB/sec, or 1,233,600 bits/second.

The CIRC code used on audio CDs can correct burst errors of up to 3500 bits (2.4mm) and can interpolate error bursts of up to 12,000 bits (8.5 mm). Advances in technology in the past 20 years have lead to even more applications for CD technology including DVDs. The error correction on a CD guarantees that high quality music can be enjoyed consistently and reliably.

 


Bibliography

Coding and Modulation Schemes for Digital Audio www.stanford.edu/courses/192b/lectures/5/5.html

Our Guide CD Recording Methods & Error Correction www.a1-electronics.co.uk/PcHardware/CD-RW/RecordMethods.shtml

Frequently Asked Questions About Compact Discs www.mscience.com/faq28.html

Compact Discs

www.cs.tut.fi/~ypsilon/80545/CD.html

Reed-Solomon Codes http://www.4i2i.com/reed_solomon_codes.htm

Algebraic Decoding http://math.berkeley.edu/~berlek/alg.html

Subcode On the Compact Disk http://www.ee.washington.edu/conselec/CE/reports/Group.1/matt_page_individual/subcode.html

CD-ROM Technical Summary From Plastic Pits to Fantasia pauillac.inria.fr/~lang/hotlist/cdrom/Documents/tech-summary.html

CD Data Coding www.disctronics.co.uk/technology/cdbasics/cd_frames.htm

CD Physical Specification www.disctronics.co.uk/technology/cdbasics/cd_specs.htm

Compact Disc Creation Services , CD-ROM Information