Zum Inhalt springen

The Space Shuttle's Two Software Teams That Were Forbidden to Communicate

Zusammenfassung

The Space Shuttle’s primary flight software (written in HAL/S by IBM) and its backup flight system (written in a different language by Rockwell’s Autonetics division) were developed by teams that were deliberately kept separate. The backup team was given only the shuttle’s requirements — not the primary code, not its design, and not any contact with the primary software developers. The goal: if the two systems had the same bug, it would have to be a bug in the requirements, not a design error shared through communication. Across 135 missions, neither system ever exhibited a common-mode software failure.

The Problem of Common-Mode Failure

In safety-critical systems, redundancy is the standard defense against failure: if one system fails, a backup takes over. But redundancy fails to protect against errors that appear in both the primary and backup systems — “common-mode failures.” If both systems are designed by the same team, following the same assumptions, reading the same documentation, and discussing the same implementation choices, they are likely to share the same bugs.

The Space Shuttle program’s flight software engineers understood this problem by the late 1970s. The primary flight software — approximately 420,000 lines of HAL/S code running on five IBM AP-101 computers — controlled all aspects of ascent, orbital maneuvering, and re-entry. A software error in a critical flight phase could be catastrophic.

The solution was design diversity: a completely independent team, using different programming tools and prohibited from accessing the primary software, would implement the same requirements in a different system.

The Two Systems

Primary Avionics Software System (PASS): Written in HAL/S (High-order Assembly Language/Shuttle), a language specifically designed for the shuttle’s real-time requirements. Developed by IBM’s Federal Systems Division. The five computers running PASS voted on their results; a majority vote determined the commanded action. The system was 420,000+ lines across multiple releases over the program’s lifetime.

Backup Flight System (BFS): Written in a different language by Rockwell’s Autonetics division (now part of Boeing). The BFS ran on the same five AP-101 computers but was loaded separately and activated only if PASS failed. The team was given the shuttle’s requirements documents and nothing else.

The separation was enforced operationally: BFS team members were not allowed to read PASS source code, attend PASS design reviews, or consult with PASS engineers about implementation choices. The only shared information was the specification of what the software had to do.

The Quality of the Primary Software

The PASS software became famous in software engineering for its extremely low defect rate. Charles Fishman’s 1996 Fast Company article “They Write the Right Stuff” documented the IBM shuttle team’s process: the last three versions of the 420,000-line program had a single error each — on the order of 0.1 defects per 1,000 lines of code, roughly 10× better than industry average at the time.

The team achieved this through formal inspection processes (Fagan inspections), extensive simulation testing, and a culture where finding bugs was celebrated rather than penalized. The code review process was estimated to cost more than the actual programming. The result was software that ran 135 missions over 30 years with no primary software crash causing a mission abort.

The Apollo software tradition established before the shuttle program contributed to this culture: the computing heritage of NASA software was one of the few domains where software engineering’s most rigorous practices were commercially justified.


📚 Sources