The Rise of Version Control: From SCCS to Git and GitHub

Zusammenfassung

This article traces the history of software version control from its origins as a solution to the mundane problem of overwritten files at Bell Labs in 1972, through two decades of increasingly capable but fundamentally limited centralized systems, to Linus Torvalds’ creation of Git in two weeks of furious work in April 2005 — triggered by a licensing dispute over a proprietary tool. It culminates in GitHub’s transformation of version control from a technical utility into the social infrastructure of modern software development, and the quiet irony that the same man who once gave away the world’s most important operating system eventually gave away the world’s most important development tool.

The Problem That Predated the Solution

Before version control existed, programmers solved the problem of file history the only way they could: manually.

A developer working on a critical program might maintain a directory full of numbered copies — payroll.c.1, payroll.c.2, payroll.c.backup_before_friday_change. Before modifying something important, they made a copy. When a change broke something, they diffed the copies by hand or simply remembered what they had changed. When two programmers needed to work on the same file, they coordinated by convention: one edited while the other waited.

This system worked, after a fashion, for individual programmers on small projects. It failed completely for teams working on large codebases simultaneously — and by the early 1970s, large software teams were becoming a fact of industrial computing. The problem was not merely inconvenient; it was a source of genuine catastrophic failures. A developer who overwrote a colleague’s changes without realizing it could destroy weeks of work with a single save operation.

The deeper issue was that source code had no canonical history. You could see where the code was now. You could not reliably determine how it had gotten there, who had changed what, or what the code had looked like at any prior point in time.

SCCS and RCS: The First Tools (1972–1982)

The first systematic answer came from Marc Rochkind at Bell Labs in 1972. He had been working on software for a telecommunications billing system and encountered the coordination problem directly: multiple developers modifying the same files, changes being silently lost, no record of why a particular line had been written the way it was.

His solution was SCCS — Source Code Control System. SCCS stored the current version of a file plus a sequence of deltas — minimal descriptions of what had changed from one version to the next. To reconstruct any historical version, SCCS applied the deltas in sequence. The system tracked who had made each change and when, and it enforced a simple locking discipline: only one developer could check out a file for editing at a time.

SCCS was limited — it tracked individual files, not coherent snapshots of entire projects, and its delta format was proprietary and opaque. But it established the core vocabulary of version control: check out, check in, revision, delta. It shipped as part of Unix System V and became the first widely deployed version control system.

A decade later, Walter Tichy at Purdue University built on SCCS’s foundations with RCS — Revision Control System (1982). RCS was an improvement in almost every respect: its storage format was simpler, its performance was better, and crucially, it was distributed as free software. Where SCCS had been proprietary, RCS could be inspected, modified, and freely deployed.

RCS introduced a concept that would prove durable: branching. A file’s revision history was no longer a single linear sequence but a tree. A developer could create a branch — a parallel sequence of revisions diverging from a point in the main line — and later merge it back. The mechanics were crude and the merge process was painful, but the concept captured something real about how software development actually worked: not as a single stream of improvements but as a tree of simultaneous experiments.

Both SCCS and RCS shared the same fundamental limitation: they tracked files, not projects. A coherent software release — hundreds or thousands of files that needed to be at consistent revisions simultaneously — required external bookkeeping that neither system provided.

CVS: The First Collaborative System (1986)

The gap between file-level tracking and project-level tracking was filled, somewhat awkwardly, by CVS — the Concurrent Versions System. CVS was originally a set of shell scripts written by Dick Grune in the Netherlands in 1986, intended to coordinate parallel work on the Amsterdam Compiler Kit. Brian Berliner reimplemented CVS in C in 1989, and the result became, for over a decade, the dominant version control system in open source software development.

CVS’s key innovation was not technical but social: it abandoned SCCS and RCS’s strict file-locking discipline. Rather than preventing two developers from editing the same file simultaneously, CVS allowed concurrent edits and attempted to merge them automatically afterward. When two developers changed different lines of the same file, CVS merged both changes cleanly. When they changed the same lines, CVS flagged a conflict and required a human to resolve it.

This was, in retrospect, the moment version control became truly collaborative rather than merely archival. The change in model — from “lock, edit, unlock” to “copy, edit, merge” — reflected a deeper insight: in practice, the same lines are rarely changed simultaneously, and requiring one developer to wait while another edited a file introduced far more friction than the occasional merge conflict.

Centralized vs. Distributed Version Control

CVS and its successors (SVN, Perforce) used a centralized model: a single canonical repository on a server held the authoritative history. Developers checked out working copies, made changes, and committed back to the central server. This model had a critical consequence: every meaningful operation — commit, log, diff against history — required a network connection to the central server.

Git and its distributed contemporaries (Mercurial, Bazaar) used a fundamentally different model: every developer had a complete copy of the repository, including its entire history, on their local machine. Commits happened locally, instantly, without network access. Sharing changes meant pushing to or pulling from another repository — which could be a central server, a colleague’s machine, or anything else.

This is more than a technical distinction. The centralized model makes the server the arbiter of truth and creates a bottleneck. The distributed model makes every copy of the repository equally authoritative and removes the bottleneck entirely. It also enables workflows that are simply impossible with centralized systems: developers on airplanes can commit, projects can survive the disappearance of the central server, and contributors can work on forks without permission from the original repository’s owner.

CVS had profound limitations that became increasingly visible as projects grew:

Commits were not atomic. If you committed twenty files and the network failed after ten, the repository was left in an inconsistent state — half the commit present, half missing.

Directory operations were not tracked. CVS had no way to rename a file without losing its history. Every rename was recorded as a deletion and an unrelated creation.

Branching was painful. CVS branches were difficult to manage, slow to work with, and merging was unreliable enough that many teams avoided branching altogether.

Tagging was a file-by-file operation. Marking a coherent release required tagging thousands of individual files, a slow and error-prone process.

Despite these flaws, CVS powered the development of major open source projects through the 1990s and early 2000s: the Apache web server, the early Mozilla browser, the FreeBSD operating system. The friction was accepted as an inevitable property of the problem.

Subversion: Doing CVS Right (2000)

By the late 1990s, CVS’s limitations were a persistent source of frustration in the open source community. The response was Subversion (SVN), begun in 2000 by CollabNet and led by Karl Fogel and Jim Blandy.

The goal was not to reinvent version control but to fix CVS’s specific deficiencies. Subversion introduced:

Atomic commits. A commit either succeeded completely or failed completely, leaving the repository consistent.
Versioned directories. Renames and moves were tracked with history preserved.
Improved branching and tagging. Branches and tags were cheap copy operations, making them fast and practical.
Better binary file handling.

Subversion retained the centralized architecture of CVS — a single server held the authoritative repository, and developers needed network access for most operations. This was understood to be a limitation but not yet recognized as the fundamental problem it would prove to be.

Subversion achieved rapid adoption and displaced CVS as the default choice for new projects through the mid-2000s. Apache moved to Subversion. Google used Subversion internally (on a scale that required significant custom infrastructure). For a few years, Subversion seemed like the answer.

The Linux kernel, however, used something different.

BitKeeper and the License Dispute That Changed Everything

By the late 1990s, the Linux kernel development process had outgrown any existing version control tool. The kernel received thousands of patches per release from hundreds of contributors worldwide. CVS was too slow, too fragile, and too poorly suited to the distributed, decentralized workflow that kernel development required. For several years, Linus Torvalds managed patches by hand — reviewing email, applying diffs, maintaining his own tree.

In 2002, BitMover offered the Linux kernel project a free license to use BitKeeper, a proprietary distributed version control system developed by Larry McVoy. BitKeeper was genuinely superior to anything else available: it was distributed, it handled large repositories efficiently, and it made the Linux kernel’s complex development workflow tractable. Torvalds adopted it, and for three years, BitKeeper was the tool that coordinated Linux kernel development.

The arrangement was controversial. Stallman and the Free Software Foundation objected on principle to using proprietary software to develop the kernel of the world’s most important free operating system. McVoy permitted the use on the condition that kernel developers not reverse-engineer BitKeeper’s protocols. This condition would prove unstable.

In April 2005, Andrew Tridgell — the author of Samba, the software that allowed Linux systems to communicate with Windows file servers — began writing a client that could interoperate with BitKeeper servers by reverse-engineering their protocol. McVoy considered this a violation of the license terms and revoked the Linux kernel’s free access to BitKeeper.

Torvalds had two weeks before the next Linux kernel development cycle began and no version control system to use.

Two Weeks in April: The Creation of Git (2005)

Linus Torvalds responded to the BitKeeper crisis not by adopting an existing alternative but by writing a new version control system from scratch.

This was not the obvious response. Subversion existed. Mercurial — another distributed system — was under active development. But Torvalds had looked at existing options and found them unsatisfactory. Subversion’s centralized architecture was wrong in principle. The distributed systems he knew of were too slow or designed around assumptions that didn’t match the Linux kernel’s scale.

His requirements were explicit and unusual:

Distributed, not centralized. Every developer had the full repository; no single machine was the authoritative server.
Integrity guarantees. Every version of every file should be cryptographically checksummed, making undetected corruption or tampering impossible.
Performance for Linux-scale development. Applying hundreds of patches simultaneously should be fast — measured in seconds, not minutes.
Simplicity of the underlying model. Torvalds’ contempt for CVS’s model was explicit: “CVS is the worst version control system in the world, except for all the others.”

The result, first committed on April 7, 2005, was Git. Within two weeks, Git was managing the Linux kernel’s development. About two months later it handled its first official kernel release, Linux 2.6.12 (June 16, 2005).

The core of Git’s design was radical in its simplicity: a repository was not a sequence of file deltas but a directed acyclic graph of commits, each of which was a complete snapshot of the entire project at a point in time, identified by its cryptographic SHA-1 hash. A commit pointed to its parent commit (or parents, in the case of a merge). The hash of a commit was computed from its contents, its parent’s hash, and metadata — meaning that any modification to any historical commit would change the hash of everything that followed it, making undetected history rewriting impossible.

Branches, in Git, were simply named pointers to commits in this graph — not separate copies of files, not expensive operations, just labels. Creating a branch was instant, regardless of the size of the repository. Switching between branches was fast. Merging was sophisticated enough to handle complex histories.

Git’s Content-Addressable Storage

Git stores not file versions but objects: blobs (file contents), trees (directory structures), commits (snapshots with metadata), and tags. Each object is identified by the SHA-1 hash of its contents — meaning two files with identical contents share a single storage object, and every object is automatically deduplicated. The repository is essentially an append-only database of immutable objects. This design means that Git repositories are both compact and completely tamper-evident: you cannot modify history without changing hashes, and any mismatch between expected and actual hashes indicates corruption or tampering.

Torvalds handed off Git’s day-to-day development to Junio Hamano in July 2005. Hamano became the project’s primary maintainer and has remained so for over two decades — one of the less-celebrated but more consequential acts of maintainership in open source history.

Dead End: BitKeeper — The Tool That Built Its Own Replacement

BitKeeper is the most consequential version control system most developers have never used. It pioneered the distributed model that Git later made universal. It proved that large-scale distributed development was practical. And by revoking the Linux kernel’s free license, it ensured that a free, superior replacement would be built within weeks.

McVoy’s decision to revoke the license when confronted with Tridgell’s reverse-engineering attempt was rational from a business perspective: BitMover’s product had real customers who paid for it, and allowing free access contingent on no one examining the protocols was a reasonable condition. The counterfactual — in which the kernel project continued using BitKeeper indefinitely — might have meant a slower Git or a different distributed VCS, but the underlying technical logic was always pointing toward distributed, hash-addressed repositories. Torvalds had the requirements clearly in mind; he needed an occasion to act on them.

BitKeeper remained in commercial use for years after Git’s creation. In 2016, BitMover open-sourced BitKeeper under the Apache License. The move came a decade too late to matter. By then, Git had captured the field so completely that no alternative had any realistic path to displacement.

The lesson is not that proprietary tools can never succeed alongside open source alternatives — they often do. The lesson is the fragility of arrangements that require ongoing goodwill from both parties: the moment the goodwill fractured, the need it was meeting was filled by something permanent.

GitHub: Version Control as Social Infrastructure (2008)

Git solved the technical problem. GitHub solved the social one.

By 2007, Git was clearly superior to the alternatives, but using it required comfort with the command line, understanding of its graph model, and familiarity with a workflow that had no natural home on the web. Open source projects that used Git typically hosted their repositories on project-specific web pages or on services like SourceForge and Google Code that had been designed around Subversion’s centralized model and mapped poorly onto Git’s distributed one.

In October 2007, Tom Preston-Werner and Chris Wanstrath began building GitHub during a series of weekend hackathons. The site launched publicly in April 2008 with a design principle that was simple but transformative: every Git repository was a first-class citizen with a public URL, a web-browsable history, a built-in issue tracker, and — most importantly — a fork button.

The fork button changed everything about open source contribution. Previously, contributing to a project required either write access to its repository (granted by the project’s maintainers) or an email-based patch workflow that required maintainers to manually review and apply diffs. GitHub’s model was different: anyone could fork any public repository, push changes to their fork, and send a pull request — a request to the original project’s maintainers to review and merge the changes. The maintainer could see exactly what had changed, comment on specific lines, request revisions, and merge with a button click.

This reduced the activation energy for contribution from “navigate a maintainer’s email preferences and diff format requirements” to “click fork, make changes, click pull request.” The effect on open source participation was immediate and dramatic.

GitHub and Network Effects

GitHub’s dominance is a textbook case of network effects in software infrastructure. A project on GitHub could be found by GitHub’s search. Issues and pull requests attracted developers already on GitHub. The more projects were on GitHub, the more developers created accounts; the more developers were on GitHub, the more projects chose it. By 2012, GitHub had become the default home for new open source projects, and by 2015, it was hosting more code than any other service in history. The same dynamic that had made SourceForge dominant in the late 1990s replicated itself, but faster and at greater scale.

GitHub grew beyond open source to become the standard platform for professional software development. Private repositories — paid features — turned GitHub into a business that could fund its infrastructure. Companies integrated their development workflows around GitHub’s pull request model, issue tracker, and continuous integration integrations. The GitHub Actions automation system, launched in 2018, made GitHub a runtime environment for development pipelines.

In June 2018, Microsoft acquired GitHub for $7.5 billion in stock — one of the largest acquisitions in software history. The announcement was met with a mixture of concern and resignation in the open source community: Microsoft had spent much of the 1990s and 2000s in explicit opposition to open source software, and the acquisition of the platform that hosted most of the world’s open source code raised obvious questions about future direction.

The concerns proved largely unfounded. Microsoft — transformed by Satya Nadella’s tenure into a cloud-first company with genuine open source commitments — left GitHub’s management largely autonomous, continued expanding its features, and made the core service free for public repositories. The acquisition remained controversial among developers who objected to the centralization of open source infrastructure in a single corporate entity, but the migration to alternatives like GitLab never reached the scale that would have meaningfully distributed GitHub’s effective monopoly.

Legacy: The Invisible Infrastructure

Version control is the least glamorous of the technologies that make modern software development possible, and perhaps the most foundational. It is the system that allows a team of thousands of developers spread across time zones to work on the same codebase simultaneously. It is the audit trail that explains why a piece of code was written the way it was. It is the insurance policy that makes experimentation safe.

Git’s specific choices — the hash-addressed content store, the DAG of commits, the cheap branches — shaped not just how software is developed but how developers think about development. The widespread adoption of Git normalized workflows that were previously exotic: maintaining dozens of simultaneous feature branches, rebasing work onto updated bases, bisecting commit history to find the exact commit that introduced a bug. These are not just technical operations but ways of reasoning about software change.

The trajectory from SCCS to Git is also a story about the relationship between tools and their communities. RCS replaced SCCS because it was free. CVS displaced RCS because it enabled collaboration. Subversion fixed CVS’s worst flaws. Git replaced everything because it was both technically superior and free — and then GitHub made Git’s social potential visible and accessible to a generation of developers who might otherwise never have learned the command-line tool.

Torvalds himself has been characteristically unsentimental about Git, calling it “a means to an end” that he built only to keep the kernel from descending into chaos — and admitting he lost interest once it did what he needed. Yet by some measures Git eclipsed even Linux in everyday reach: it underlies the collaborative infrastructure of nearly all modern software. Without it, that infrastructure would not exist.

For the operating system whose development crisis gave rise to Git, see The Unix Story and The Open Source Revolution.