The RAG pipeline that won 2024 — one question, one dense-vector lookup, one LLM call grounded on the top-k — is not the system that ships to production in 2026. The replacement is not bigger embeddings or a better re-ranker; it is the retrieval loop. Agentic RAG composes four architectural primitives: self-query decomposition that turns a multi-part question into a structured plan, plan-execute-replan with explicit iteration budgets that bound the loop, tool-augmented retrieval with a schema-driven router that chooses between dense indexes / SQL / graph / web search, and a validation loop with a sufficiency critic (gate to terminate-or-replan) and a faithfulness critic (deterministic gate before emission). Together they produce a bounded, observable, auditable retrieval agent. This article is the architecture-first playbook: what each primitive does, how they compose, the four failure modes specific to agentic RAG, eight anti-patterns that account for most production incidents, and the five-stage maturity ladder from classical-RAG-with-LLM-wrapper to full audit-grade deployment.