The single biggest reason production RAG systems return confident wrong answers is not the LLM, the prompt, or the chunking — it is the retriever returning the wrong documents into the top-k. Dense-vector-only retrieval gives 70% recall on conceptual queries and 30% on exact-term queries — and a better embedding model does not fix it because the failure mode is structural. The architecture the field has converged on in 2026: sparse retriever (BM25 or SPLADE) + dense retriever (bi-encoder embeddings) running in parallel, fused via RRF or weighted-α, cross-encoder re-ranker over the top-50 candidates, MMR diversification, ACL/freshness pre-filter, query understanding in front. This article is the deep-dive on what each primitive is doing, why each fails, the latency budget, eight anti-patterns, and the five-stage maturity ladder from single-retriever to calibrated-fusion-with-online-feedback.