Coding Theory and Error Correction

Zusammenfassung

If data compression removes redundancy to make messages smaller, error-correcting codes add redundancy back — carefully — so a message survives a noisy channel intact. Both fields descend from Claude Shannon’s 1948 theory, which proved a startling fact: reliable communication over an unreliable channel is possible, right up to a hard limit. In 1950 Richard Hamming, sick of a weekend computer halting on every error, built the first code that could locate and fix a bad bit. Reed–Solomon codes (1960) later let CDs survive scratches, QR codes survive smudges, and Voyager phone home from interstellar space. Coding theory is the reason digital data is, against all physical odds, exact.

Shannon’s Promise: Reliability Is Possible

Every real channel corrupts data — cosmic rays flip bits in memory, scratches blind a CD, static garbles a radio signal. The intuitive expectation is that pushing data faster over a noisy line must mean more errors. Claude Shannon’s 1948 A Mathematical Theory of Communication (see Claude Shannon and Information Theory) proved the opposite. His noisy-channel coding theorem showed that every channel has a capacity — a maximum rate below which you can drive the error probability arbitrarily close to zero by clever coding, and above which you cannot.

This was a promise without a recipe. Shannon proved good codes exist but did not say how to build practical ones. The history of coding theory is the sixty-year hunt to actually reach the limit Shannon had drawn on the map — a hunt only finally won in the 1990s.

Hamming’s Weekend Frustration (1950)

Richard Hamming worked at Bell Labs alongside Shannon. He ran computations on the relay-based Bell Model V, fed by punched paper tape. The machine could detect an error and halt — fine on weekdays, when an operator restarted it, but on unattended weekends a detected error simply killed the whole job. After losing yet another weekend’s work, Hamming snapped: “Damn it, if the machine can detect an error, why can’t it locate the position of the error and correct it?”

In 1950 he answered his own question. The Hamming code adds parity bits at clever positions so that the pattern of failed parity checks spells out, in binary, the address of the flipped bit — which you then simply invert. His (7,4) code protects 4 data bits with 3 parity bits and corrects any single-bit error. This introduced the foundational idea of Hamming distance — the number of bit positions in which two codewords differ — and the principle that spacing valid codewords far apart in this distance is what makes errors correctable. Hamming received the Turing Award in 1968 largely for this work. ECC memory in servers still uses his descendants today.

Reed–Solomon: Surviving Bursts (1960)

Hamming codes fix scattered single-bit errors. Real-world damage is rarely so polite: a scratch, a dust speck, or a fading radio signal destroys a whole run of consecutive bits — a burst error. The answer came in 1960 from Irving Reed and Gustave Solomon at MIT Lincoln Laboratory, in a five-page paper titled Polynomial Codes over Certain Finite Fields.

Reed–Solomon codes treat data as the coefficients of a polynomial over a finite field and transmit extra evaluation points. Because any k points determine a degree-(k−1) polynomial, the receiver can reconstruct the original even if a bounded number of points are wrong or missing. Crucially, RS works on symbols (bytes), not bits, so a single corrupted byte counts as one error no matter how many of its bits are wrong — exactly the right tool for burst errors. Reed–Solomon went on to protect an astonishing range of everyday technology:

Compact discs use CIRC (Cross-Interleaved Reed–Solomon Code): data is shuffled across the disc so a physical scratch becomes scattered, recoverable symbol errors rather than one fatal gap. This is why a fingerprinted, lightly scratched CD still plays perfectly.
QR codes embed Reed–Solomon so a code still scans with up to ~30% of it obscured by a logo or dirt.
DVDs, Blu-ray, DSL, digital TV, and RAID 6 storage all lean on RS or close relatives.

Reaching the Edge of the Solar System

The most dramatic proof of coding theory is Voyager. After the Saturn flyby, NASA uploaded new software so the probes used Reed–Solomon coding (concatenated with a convolutional code). Across billions of kilometers, with a transmitter weaker than a refrigerator bulb, the scheme drove the error rate from roughly 1 bad bit in 100,000 down to about 1 in a million, at only ~20% coding overhead. Half a century after launch, Voyager 1 still returns usable data from interstellar space — a signal so faint it is essentially noise, pulled back to exactness by the redundancy wrapped around it. Without error-correcting codes, deep-space exploration would be impossible.

Touching the Shannon Limit (1993–today)

For four decades, practical codes stayed a frustrating gap below Shannon’s theoretical capacity. Two breakthroughs finally closed it:

Turbo codes (1993), introduced by Claude Berrou, Alain Glavieux, and Punya Thitimajshima, used two simple codes feeding probability estimates back and forth in an iterative decoder. They came within a fraction of a decibel of the Shannon limit — the first practical codes to do so — and were adopted across 3G and 4G mobile networks.
LDPC codes (low-density parity-check), invented by Robert Gallager in his 1962 MIT thesis, were too computationally demanding for the era and were forgotten for thirty years — then rediscovered in the late 1990s once hardware caught up. LDPC now protects Wi-Fi, 10G Ethernet, SSDs, and the data channels of 5G, where it sits essentially on the Shannon limit. Polar codes (Erdal Arıkan, 2009), the first codes provably reaching capacity, handle 5G’s control channels.

Shannon’s 1948 promise — error-free communication up to the channel limit — was finally, fully kept by engineering that arrived half a century after the theorem.

⚠️ Dead End: Gallager’s Codes, Thirty Years Early

LDPC codes are a clean case of an idea too far ahead of its hardware. Gallager described them, and proved them excellent, in 1962. They then vanished from practice for roughly three decades. The iterative decoding they require was hopelessly expensive on the computers of the 1960s and 70s, so the field built its careers on algebraic codes (Reed–Solomon, BCH) instead, and Gallager’s thesis became a half-forgotten citation.

It took the rediscovery of iterative decoding via turbo codes in 1993, plus Moore’s-Law silicon, for David MacKay and others to dig LDPC back up in the mid-1990s and realize it was sitting at the Shannon limit all along. The lesson recurs throughout this encyclopedia: an algorithm can be correct, optimal, and published — and still be a dead end until the hardware to run it economically exists. Gallager’s “failure” simply had to wait for the world to be ready for it.