Zum Inhalt springen

C's Undefined Behavior: The Language Feature That Isn't a Feature

Zusammenfassung

The C programming language’s standard formally defines certain operations as “undefined behavior” — meaning the standard places no requirements on what the compiler or program must do when such an operation occurs. Division by zero is undefined behavior. Signed integer overflow is undefined behavior. Accessing memory out of bounds is undefined behavior. A compiler encountering undefined behavior is legally allowed, under the C standard, to do anything: crash, continue, return a garbage value, or — more dangerously — optimize the code in ways that eliminate the check that was supposed to prevent the problem. This is not a bug in the standard; it is a deliberate design choice that has caused thousands of security vulnerabilities.

Why Undefined Behavior Exists

C was designed in the early 1970s for systems programming on the PDP-11. Its design philosophy was to allow the programmer to express hardware operations directly without the overhead of safety checks. “Trust the programmer” was the explicit assumption.

Undefined behavior exists in the C standard for three reasons:

  1. Portability across hardware: Division by zero produces different hardware exceptions on different processors (some trap, some produce a result, some produce a CPU exception). Rather than mandating specific behavior that might be impossible or expensive on some hardware, the standard declares it undefined, allowing implementations to do what is natural for the target architecture.

  2. Optimization permission: If a compiler knows that a particular code path contains undefined behavior, it can assume that path is never reached and optimize accordingly. An optimizer can remove a null pointer check before a dereference by reasoning: “if the pointer were null, this dereference would be undefined behavior, and undefined behavior never happens in a correct program, therefore the pointer is not null, therefore the check is unnecessary.” This produces faster code — and security vulnerabilities when the assumption is wrong.

  3. Historical inertia: C inherited its undefined behavior taxonomy from K&R C, which predates the standard. By the time the ANSI C standard (1989) was written, the behavior was already defined by practice; formalizing it as “undefined” was more honest than pretending it had specified behavior.

The Security Consequences

Compiler-optimized-away safety checks are a persistent source of real vulnerabilities. Examples:

Null pointer check elimination:

if (ptr != NULL) {  // compiler may eliminate this check
    *ptr = value;   // if it can prove *ptr must be valid
}                   // if ptr is NULL, this is UB anyway

Integer overflow assumption:

if (x + 1 > x) {   // compiler knows signed overflow is UB
    ...             // so x+1 > x is always true → check eliminated
}

In practice, these patterns appear in kernel code, security-critical libraries, and system software. When a security researcher demonstrates that a compiler optimization has removed a bounds check, the response from language standards committees is technically correct — the code had undefined behavior, the optimizer was permitted to assume it never happened — but practically disastrous.

The memory-safe languages — Rust, in particular — were designed specifically to eliminate undefined behavior through language-level guarantees. The Rust and Memory Safety article describes how the borrow checker enforces constraints that make undefined behavior impossible to express.


📚 Sources