Zum Inhalt springen

AlphaZero: World Chess Champion After 9 Hours of Self-Play

Zusammenfassung

AlphaZero, DeepMind’s general-purpose game AI, started with only the rules of chess — no human games, no opening books, no endgame databases — and played approximately 44 million games against itself over 9 hours. At the end of those 9 hours, it was the strongest chess entity in the world. In a 1,000-game match against Stockfish 8 (then the world’s strongest traditional chess engine), AlphaZero won 155 games, drew 839, and lost 6. Stockfish evaluates 70 million positions per second; AlphaZero evaluates 80,000. Quality of evaluation, when good enough, beats speed of evaluation.

The Architecture

AlphaZero was published in a 2017 paper by David Silver and colleagues at DeepMind. Unlike the systems that came before it — Deep Blue (handcrafted evaluation functions), Stockfish (decades of expert tuning), or even AlphaGo (which used human game data for initial training) — AlphaZero used only tabula rasa reinforcement learning:

  1. Start with a neural network that has no knowledge of chess except the rules.
  2. Have the network play games against itself.
  3. Update the network’s weights to prefer moves that led to winning positions.
  4. Repeat approximately 44 million times.

The network that emerged evaluated positions using intuition — pattern recognition developed through self-play — rather than by searching millions of positions per second for forced sequences. The neural network’s evaluation was so accurate that searching 80,000 positions was sufficient to identify the best move.

The Results Against Stockfish

The match conditions published in the 2017 paper gave each engine one minute of computing time per move. Stockfish was configured with its standard opening book. AlphaZero was not given an opening book — it had to play any opening it had discovered through self-play.

Result summary:

  • AlphaZero wins: 155
  • Draws: 839
  • Stockfish wins: 6

The games were analyzed by grandmasters. AlphaZero’s play was described as aesthetic rather than purely tactical: it sacrificed material for long-term positional advantages that it evaluated as winning before any human could calculate why. It played openings (the London System, for example) that had been considered safe and conservative and converted them into decisive attacks.

The Three Games in One Training Run

AlphaZero’s 2018 paper trained the same architecture on chess, shogi (Japanese chess), and Go simultaneously — each game from scratch. The time required to reach superhuman level:

  • Chess: ~9 hours
  • Shogi: ~12 hours
  • Go: ~13 days

AlphaZero beat the previous Go champion (AlphaGo Zero) after 13 days of self-play. This result — covered in DeepMind and AlphaGo — established that a single architecture, with no domain-specific engineering, could master multiple complex games from rules alone.

The philosophical implication: deep learning systems can develop sophisticated strategic understanding through experience in a way that does not require explicit programming of the strategic concepts. Whether this represents “understanding” in any meaningful sense is debated, but the practical result — superhuman performance across multiple domains — is not.


📚 Sources