Test-driven development does not survive the transition to LLM systems — the assertion cannot be strict-equality, the correct output is a distribution, the red-green-refactor loop has no green, and the assertion itself is fallible. Evaluation-driven development is the discipline that replaces TDD: the same shape of "write the assertion before the implementation, ratchet it as the implementation improves, gate every change on the verdict", but with eval sets instead of unit tests, distribution verdicts instead of boolean pass-fail, calibrated LLM judges instead of strict equality, and a ratcheted baseline instead of a fixed expected output. This article is the methodology, the eval-set hygiene (golden, regression, adversarial, drift), the four eval layers (unit, scenario, shadow, canary), the LLM-as-judge calibration practice, the CI integration, 8 anti-patterns, and the 5-stage maturity ladder.