Seymour Cray and Supercomputing

Zusammenfassung

Seymour Cray designed the fastest computer in the world six times. He did it by working alone, or nearly alone, in a laboratory he built under his house in Chippewa Falls, Wisconsin, away from corporate offices, committees, and the accumulated conventional wisdom about what computers should look like. His CDC 6600 (1964) was six times faster than any IBM machine, achieved with a team of fourteen people against IBM’s response team of thirty-four. His Cray-1 (1976) introduced vector processing to scientific computing and was sold to nuclear weapons laboratories for $8 million a unit. His approach — one genius, minimal management, extreme technical focus — was the antithesis of everything that built OS/360. It worked until the problems became too large for one mind, and then the world moved on without him.

The Hermit of Chippewa Falls

Seymour Roger Cray was born in Chippewa Falls, Wisconsin on September 28, 1925, the son of a civil engineer. From early childhood, he displayed the characteristics that would define his career: meticulous, patient, absorbed in technical problems to the near-exclusion of social obligations, and capable of sustaining obsessive concentration over months and years. As a teenager he wired a neighbor’s barn for electricity and built a model railroad with its own automated control system. He seems to have found such projects not challenging but natural — the kind of work that other people might call fun, though Cray himself would probably not have used that word.

He served in the Pacific theater during World War II as a radio operator and cryptanalyst, then completed bachelor’s and master’s degrees in electrical engineering at the University of Minnesota in 1950 and 1951. He joined Engineering Research Associates (ERA) in St. Paul — one of the earliest commercial computing companies, staffed heavily by Navy cryptanalysts who had built code-breaking machines during the war. ERA was developing scientific computers, and Cray found his vocation immediately.

When ERA was acquired by Remington Rand (which also owned UNIVAC), Cray designed the ERA 1103 (1953), a vacuum-tube scientific computer that became one of the most capable machines of the early 1950s — his first computer design. (His fully transistorized breakthrough would come later, with the CDC 1604.) He was technically gifted and organizationally impatient. Large company meetings bored him. Management layers frustrated him. He preferred to work with a small group of engineers who understood what he understood, with minimal overhead.

In 1957, William Norris — ERA’s founder — left Remington Rand to found Control Data Corporation (CDC) in Minneapolis. Cray went with him as the chief designer. He moved to Chippewa Falls, established a small lab, and began designing what would become the most powerful scientific computers of the next two decades.

The CDC 6600: Building the First Supercomputer

The CDC 6600, completed in 1964, is the machine from which the history of supercomputing is usually dated. The term “supercomputer” did not yet exist when Cray built it, but the 6600 embodied everything the word would come to mean: a machine designed explicitly for maximum computational speed on scientific problems, without compromise in the direction of general-purpose flexibility, ease of programming, or cost efficiency.

Cray’s architectural choices in the 6600 were prescient to a degree that seems remarkable in retrospect. The machine used ten peripheral processors — small auxiliary computers — to handle all input and output: tape drives, card readers, printers, operator console interactions. These peripheral processors ran asynchronously and continuously, buffering data to and from the central processor. The central processor never waited for I/O; it was always computing.

Within the central processor, Cray implemented ten functional units that operated independently: add, multiply, divide, increment, boolean operations, shift operations, branch. When the instruction stream contained operations that could execute on different functional units simultaneously — and on well-written scientific code, it usually did — multiple instructions could be in execution at once. This was instruction-level parallelism, long before the concept was formalized. The 6600’s compiler could schedule instructions to maximize concurrent functional unit utilization.

The result was a machine that achieved approximately 3 megaFLOPS — three million floating-point operations per second. The most powerful IBM machine available, the IBM 7094, achieved roughly 500 kiloFLOPS. The CDC 6600 was six times faster than anything IBM could offer.

IBM’s response to the 6600 announcement was to assemble a task force of thirty-four engineers to design a competing machine. When news of this reached Cray, his reported reaction was that he could not understand why IBM needed thirty-four people to work on something he had built with fourteen — including, he noted, the janitor.

The remark reached Thomas Watson Jr., IBM’s CEO, who sent a famous internal memo asking how a small company in Minneapolis could outperform IBM with fewer people. Watson’s memo is usually read as a demand for self-examination by IBM’s engineering organization. It is also a measurement of Cray’s achievement.

Cray Research and the Cray-1

In 1972, Cray left Control Data Corporation — frustrated by the growing size and bureaucracy of a company that had grown from a small startup into a corporation with thousands of employees — to found Cray Research in Chippewa Falls. Norris, who had been his patron, allowed him to take the work he was doing on the CDC 8600 (a 6600 successor) with him, in exchange for a licensing arrangement.

The founding of Cray Research embodied a particular theory of creative technical work: that a small team of extremely capable engineers, working without organizational overhead, reporting to a single technical authority, could outperform far larger teams working in conventional organizational structures. Cray was the technical authority. He hired carefully, kept the team small, and worked alongside his engineers rather than managing them from a distance.

The Cray-1, completed in 1975 and delivered to Los Alamos National Laboratory in 1976, was Cray’s masterpiece and the machine most closely associated with his name. Its key innovation was vector registers: eight 64-element registers, each element 64 bits wide. A single vector instruction could apply an arithmetic operation to all 64 elements of a register simultaneously — essentially replacing a loop of 64 scalar operations with a single operation that completed in nearly the same time as one scalar operation.

Scientific computing was dominated by loops over large arrays — numerical solutions to differential equations for fluid dynamics, weather prediction, nuclear weapons simulation, structural analysis. These loops were exactly what vector registers accelerated. A scientific code that ran at some speed on a conventional processor could run four to eight times faster on the Cray-1, simply because the loops could be vectorized.

Info

Vector processing was not Cray’s invention — it had appeared in earlier machines including the STAR-100 and the ASC — but the Cray-1’s vector architecture was cleaner, more efficiently pipelined, and better matched to the manufacturing technology of 1975. The 80 MHz clock speed, achieved by packing circuits into the tightest possible physical space with the shortest possible signal paths, gave the Cray-1 a speed advantage over competitors that was difficult to match without adopting similar circuit density.

The Cray-1’s physical form was as distinctive as its performance. To minimize signal propagation delay — light travels approximately 30 centimeters per nanosecond, and the Cray-1’s 12.5-nanosecond clock cycle meant signals could not travel more than about 15 centimeters without arriving late — Cray packed the circuitry into a cylindrical tower roughly 1.7 meters in diameter. No wire was longer than it needed to be. The cylinder’s outer ring was covered with a padded bench, which concealed the Freon refrigeration system that cooled the dense circuitry. The Cray-1 became one of the most recognizable industrial objects of the 1970s: simultaneously a scientific instrument and an aesthetic statement.

The first unit was sold to Los Alamos for $8.8 million. The National Center for Atmospheric Research and Lawrence Livermore National Laboratory followed. By 1980, most major scientific computing centers ran Cray systems.

Tipp

The padded bench that wrapped the Cray-1’s base became one of computing history’s most discussed design details. Journalists invariably mentioned it; the machine was often photographed with Cray sitting on it. The bench was entirely practical — it contained the cooling system — but Cray’s ability to create objects that were both beautiful and purposeful became part of his legend. He reportedly said he designed the bench because the power supplies underneath it needed to be accessible, and the bench made the space usable rather than wasted.

The X-MP, Cray-2, and the Limits of Solitary Genius

Cray Research through the late 1970s and early 1980s was a remarkable company: small, profitable, technically dominant, and organized around one man’s judgment about what the next machine should look like. But the company was growing, and the problems Cray was solving were growing with it.

The Cray X-MP (1982) was designed not by Cray but by Steve Chen, one of Cray Research’s engineers. It used two processors running in parallel on shared memory, delivering roughly twice the scalar performance of the Cray-1. The X-MP was a significant commercial success and demonstrated that multi-processor architectures could be built within the Cray approach — but it was not Cray’s design.

Cray’s own Cray-2 (1985) was technically radical in a different direction: it operated at 244 MHz, used four processors, and included 2 gigabytes of memory — enormous for the time. The circuit boards were cooled by liquid Fluorinert in which they were literally submerged. The Cray-2 was the fastest computer in the world at delivery. But its memory architecture introduced latencies that limited the efficiency with which scientific codes could use its peak performance, and software optimized for the Cray-1 did not automatically run faster on the Cray-2. The X-MP outperformed the Cray-2 on many real workloads despite the Cray-2’s higher theoretical peak.

The tension between Cray’s approach — one visionary, small team, revolutionary new design — and the engineering reality of increasingly complex systems was becoming apparent. The next generation of supercomputers required not just faster processors but faster memory systems, better compilers, better algorithms, and teams of people who could work on these problems simultaneously. It was not clear that Cray’s method could scale to problems of this complexity.

Departure and the Gallium Arsenide Gamble

In 1989, Cray left Cray Research — frustrated by management disagreements and the constraints of running a public company — to found Cray Computer Corporation in Colorado Springs. His goal was to build the Cray-3 and Cray-4 using gallium arsenide (GaAs) transistors rather than silicon.

Gallium arsenide has a higher electron mobility than silicon, allowing transistors to switch faster at lower voltages. Theoretically, a GaAs processor could be significantly faster than the best silicon alternative. In practice, GaAs manufacturing was far more difficult and expensive than silicon, yields were poor, and the infrastructure of tools, foundries, and engineering expertise that made silicon chips affordable did not exist for GaAs.

Warnung

Cray Computer Corporation spent eight years and hundreds of millions of dollars trying to make GaAs supercomputers commercially viable. The Cray-3 was eventually completed in a single prototype unit in 1993; the Cray-4 was partly designed but never built. No commercial customers purchased the Cray-3 at prices that would have sustained the company. The fundamental problem was the same one that defeated every GaAs computing initiative of the era: GaAs was faster per transistor, but silicon could put far more transistors on a chip at far lower cost, and raw transistor count — combined with Moore’s Law scaling — eventually overwhelmed GaAs’s per-transistor speed advantage. Cray Computer Corporation filed for bankruptcy in March 1995. It was the only one of Cray’s major ventures that failed.

Death and Legacy

Seymour Cray died on October 5, 1996, in Colorado Springs, from injuries sustained in a three-car accident on September 22 — nearly two weeks earlier. He was seventy-one and had recently founded a new company, SRC Computers, to continue research on high-performance computing. He was working when he died.

The world of supercomputing he had dominated for three decades had, by 1996, moved decisively toward a model he had never embraced: clusters of commodity processors. The 1994 Beowulf cluster — sixteen standard Intel 486 PCs running Linux, connected by Ethernet, for $50,000 — pointed toward the future. By 1997, the fastest computer in the world was the Intel ASCI Red, a cluster of commodity Pentium Pro processors. Purpose-built vector processors could not match the price-performance of commodity hardware riding the exponential improvement of Moore’s Law.

Cray had understood, better than almost anyone, how to build the fastest possible machine from a given generation of components. What he never reconciled himself to was the possibility that the race he was running — the race for the fastest individual machine — was being supplanted by a different race, for the most effective combination of components, network, and software. He was the greatest practitioner of an art that the industry left behind.

The full story of supercomputing from Cray’s era to the present is told in The Supercomputer Era. The semiconductor context of his career is in The Integrated Circuit Revolution.

📚 Sources

Charles J. Murray: The Supermen: The Story of Seymour Cray and the Technical Wizards Behind the Supercomputer (1997), Wiley — worldcat.org/title/supermen
James E. Thornton: Design of a Computer: The Control Data 6600 (1970), Scott, Foresman — worldcat.org/title/design-of-a-computer
Cray Research: Cray-1 Computer System Hardware Reference Manual (1977) — worldcat.org/title/cray-1-computer-system-hardware-reference-manual
Jack J. Dongarra et al.: Sourcebook of Parallel Computing (2003), Morgan Kaufmann — worldcat.org/title/sourcebook-of-parallel-computing
Thomas Sterling, Donald J. Becker, Daniel Savarese et al.: “BEOWULF: A Parallel Workstation for Scientific Computation” — Proceedings of the International Conference on Parallel Processing (1995) — ICPP'95 full text
Michael J. Muuss: “Memories of Seymour Cray” — IEEE Annals of the History of Computing, Vol. 19, No. 1 (1997) — doi.org/10.1109/85.560726