The single largest under-used lever in production LLM inference in 2026 is speculative decoding. A correctly tuned vLLM deployment with EAGLE-3 or Medusa heads delivers 2.5–3.2× throughput on the same hardware for the same model with bit-exact outputs. The arithmetic: with α=0.8 acceptance, K=5 speculation length, and draft/target cost ratio c=0.08, the speedup formula (1 − α^(K+1)) / ((1 − α) × (K × c + 1)) lands around 2.4× and rises to 3× as α climbs. Most production deployments have not adopted it, not because the technique is exotic but because the operational subtleties — draft-model selection, acceptance-rate decay on long contexts, batch interaction effects, and the cases where naive speculation actively loses — are not well understood. This article is the production playbook: what speculative decoding actually does to the autoregressive loop, the EAGLE / Medusa / Lookahead / n-gram family, the vLLM integration surface, the four workload shapes where speculation wins or loses, the long-context failure mode that catches teams off-guard, eight anti-patterns, and a five-stage maturity ladder.