AI Incident Response Runbook: RCA for LLM Failures (2026)

Q: Why does the standard SRE incident response model not transfer cleanly to LLM systems?

The classical SRE model assumes deterministic systems whose failure modes are categorical, whose state changes are identifiable in deploy and configuration logs, and whose mitigation is to revert the change. LLM systems break this in five structural ways. The failure modes are probabilistic and gradient (hallucination is a distributional shift, not a binary state); the state space is not entirely in your control (vendor model updates change behaviour outside your deploy log); the failure surface includes adversarial inputs (prompt injection, jailbreak, retrieval poisoning combine attack and reliability); the rollback semantics are different (the prior known good is a tuple of model version, prompt template, retrieval snapshot, and eval baseline rather than a single artefact); and the postmortem produces additional action item types (eval-set probes, prompt adjustments, guardrail extensions, retrieval policy changes, model-routing adjustments) beyond the SRE infrastructure action items. Teams that adapt the SRE runbook with LLM-specific notes appended discover the gap during the first material incident; teams that design the runbook from the LLM failure surface ship a response architecture that matches the actual failure modes. The SRE tradition contributes the discipline (blameless analysis, structured postmortem, action-item ledger) but the content needs to be designed for the LLM substrate.

Q: How do you grade LLM incident severity in a way that operates better than a generic three-tier model?

A four-level model that enumerates the LLM-specific harms against each level operates well. Sev1 covers material harm in flight: confidently wrong information in domains where wrongness has consequence (medical, legal, financial, safety), regulated-boundary breach (PII leak, copyrighted reproduction, regulated-claim violation, authorisation bypass through injection), agentic actions executed against the wrong target (wrong customer, wrong amount, wrong account), and material cost burn (typical 10× daily-budget rate sustained over an hour or 100× over five minutes for unbounded loops). Sev2 covers quality degradation affecting users: hallucination rate above per-category baseline by a defined factor, elevated guardrail trips indicating attack or behaviour shift, latency p99 elevated to session-abandonment, bounded cost burn (typical 2-4× normal), and elevated incident-class user complaints. Sev3 covers operational anomaly without immediate user impact: eval drift on a category not in active use, vendor outage with fallback chain catching, isolated retrieval anomaly, security reconnaissance without exploitation. Sev4 covers informational anomaly worth investigating: unusual user-input distribution, vendor model-card update without observed shift, prompt-injection attempt cleanly caught. The on-call engineer classifies at triage with the runbook as guidance; the classification is recorded and may be revised. The architectural test is the false-page rate (sev1s that should have been sev2s) and the false-quiet rate (sev2s initially classified sev3 that grew); both are reviewed in the monthly retro.

Q: What detection signals do you actually need to instrument for LLM systems beyond the standard infrastructure metrics?

Six signal classes cover the LLM-specific surface. Eval drift signals from the continuous eval pipeline (the regression eval set runs against the production model on a sampled rolling basis; fires when pass rate drops below per-category threshold over a defined window with a noise floor that prevents single-eval-failure pages). Guardrail trip signals (PII, prompt-injection, jailbreak, regulated-claim, harmful-content classifiers) distinguishing elevated catches (attack or correctly-caught behaviour shift, typically Sev3 with security co-engagement) from elevated bypasses (regression in the guardrails or in underlying model behaviour, typically Sev2 with rollback options on the table). Cost spike signals with per-tenant, per-feature, per-flow cost-attribution breakdown (single-tenant spikes typically indicate agentic feedback loops; the runbook prioritises the kill-switch over the diagnosis). Latency p99 signals with per-component breakdown (model-call, retrieval, guardrail, post-processing) and routine vendor-status-check first triage step. Hallucination rate signals where supportable through structured-output validation, fact-checking against retrieval, or downstream user-correction signals. User-report signals through a classifier on the support and feedback channels distinguishing incident-class complaints from feature requests. The signal stack pays back when the coverage is broad enough to catch the diverse failure surface and the noise is low enough that the on-call engineer trusts the page; the architectural investment is in the calibration as much as in the collection.

Q: What containment primitives should be operable during the first 30-60 minutes of an LLM incident?

Six primitives operable from a single on-call console. Model-version pin (the inference-gateway layer pins the active model version per route or per category, reverting to prior known-good, depending on the vendor still serving prior versions which the major frontier vendors do for defined windows). Prompt rollback (the prompt-template versioning system rolls back a specific template per category to a prior version). Retrieval-store quarantine (the retrieval pipeline quarantines specific content sources or content versions, removing suspect content from retrieval results immediately). Canary halt (the deploy pipeline halts a canary rollout in progress and reverts the canary fleet). Traffic shaping (per-route, per-tenant, per-feature rate limits, request-size caps, context-window caps, timeout reductions for cost spikes, latency regressions, and feedback loops). Kill-switch (the agent gateway disables the agentic capability for a route or globally, routing requests to a defined fallback such as a human-handled queue, a pre-AI workflow path, or a polite degraded response). All six are exercised in monthly drills that trigger synthetic incidents and validate operation against the documentation; teams that build the primitives but do not exercise them find at the first real incident that the documentation has drifted. The primitives are the first-line response to the corresponding incident class; the engineer applies the appropriate primitive while the root cause is investigated rather than serialising containment behind diagnosis.

Q: What does the RCA template look like when extended for LLM failure classes, and how does it differ from a standard SRE postmortem template?

The template extends the Google/Etsy blameless-postmortem tradition with LLM-specific sections. Incident summary (what happened in 2-3 sentences for the leadership update, user experience, business impact, severity with rationale). Detection (which signal fired, when, with what initial fidelity; the triage path from signal to incident classification; time from initial signal to engaged response; signal-timeliness assessment). Containment (which primitives invoked, in what order, with what effect; engagement-to-containment time; primitive-choice retrospective). Failure class (the LLM failure class identified — hallucination, prompt injection, model regression vendor or self, retrieval poisoning, retrieval-store anomaly, vendor outage, jailbreak success, context-window leak, agentic feedback loop, prompt-template defect, eval-set blind spot, guardrail bypass, cost-attribution defect — with evidence and any classification uncertainty). RCA (proximate cause, contributing causes, underlying systemic causes, distinguishing model contribution from system contribution where applicable while keeping the responsibility allocation system-level). Action items (eval-set contribution, prompt-template adjustment, guardrail extension, retrieval policy change, model-routing adjustment, observability improvement, runbook update, architectural change). Honest uncertainty (aspects not fully understood, hypotheses explored and rejected, open questions). The template is not a checklist; it is a discipline for the postmortem author and the review meeting to work through.

Q: How do you handle vendor-side incidents where the proximate cause is an Anthropic, OpenAI, or Google model update you do not control?

The blameless discipline applies even though the temptation is to blame the vendor. The system permitted the vendor update without sufficient validation against the regression eval set before promotion (or did not have the regression eval case yet, which is a contribution to the action items); the system did not have a canary stage configured for vendor model updates that would have caught the regression in canary (also an action item); the detection signals took longer than they should have to fire (also an action item). The vendor contributed; the system permitted; the action items address the system. The containment response: model-version pin to the prior vendor version if the vendor still serves it, switch to a fallback vendor through the model-router layer if not, route around the broken capability if neither is available. The vendor-relations subject-matter rota is engaged for incidents requiring escalation through the vendor support channel; named contacts and escalation paths are documented before the incident. The architectural lesson recurring vendor-side incidents teach is that the system needs vendor portability through a model-router abstraction, a fallback chain configured per category, and a regression eval that runs against vendor candidates before promotion. Teams that operate this way absorb vendor-side incidents with bounded user impact; teams that depend on a single vendor without the abstractions experience extended outages every time a vendor ships an update that affects their use case.

Q: How do you handle prompt injection and jailbreak incidents that are simultaneously security incidents and reliability incidents?

The response composes the security-incident-handling discipline with the reliability-incident-handling discipline. The reliability dimension uses the standard severity classification, the detection signals (typically the guardrail trip rate showing elevated catches and possibly elevated bypasses), and the containment primitives (typically traffic shaping for the affected route to bound the attack volume, prompt rollback if the recent prompt change introduced an injection vulnerability, kill-switch for routes where the bypass is causing material harm). The security dimension adds forensic preservation (the attack inputs are captured for analysis rather than discarded with the trip log), attacker-pattern analysis (the injection technique is classified — instruction override, context confusion, encoded-instruction injection, indirect injection through retrieved content — and the pattern catalogued), threat-intelligence contribution (the pattern is shared with industry threat-intelligence networks where appropriate, and with the affected vendors), and security-team post-incident review separate from the reliability postmortem. The on-call rota structure recognises this: the security on-call is a subject-matter rota engaged for these incident classes; the incident commander coordinates the security and reliability streams; the postmortem contributes to both the reliability action-item ledger and the security pattern library. The action items typically include guardrail extensions for the specific injection pattern, prompt-template hardening against the technique, retrieval-policy review for indirect injection through retrieved content, and security-monitoring extensions for the reconnaissance signature that preceded the exploitation.

Q: How do you structure the on-call rota for LLM systems given the skill set differs from traditional SRE on-call?

A four-tier structure operates well. Primary on-call: an engineer on the LLM-system team with operational familiarity with the runbooks, the containment primitives, the observability surfaces, and the model-routing topology; takes the page, classifies severity, runs initial triage, invokes containment primitives, engages the incident commander on Sev1 or escalating situations. Secondary on-call: a senior or staff engineer with deeper familiarity with model behaviours, vendor relationships, and architectural choices; engaged by the primary when triage requires deeper context or containment choices involve trade-offs; also coaches the primary in the post-incident retro. Incident commander rota: senior engineers, engineering managers, senior product or operations leadership who can take the IC role for Sev1 incidents — coordinating across teams (security, vendor relations, customer communication, leadership), making trade-off calls (kill-switch vs degraded operation, public communication timing, customer notification scope), owning the incident through recovery and postmortem. Subject-matter rotas: security on-call for prompt-injection and jailbreak incidents, vendor-relations on-call for vendor-side escalation, legal on-call for regulated-content incidents requiring counsel, customer-success on-call for incidents requiring direct customer engagement. The rota structure depends on team size; small teams collapse roles, large teams split further. The architectural test is sustainable rota loads (primary shifts no more often than every 6 weeks typically), current rota training (every primary has run a drill in the last quarter), and reliable rota communication (paging tested weekly, escalation paths verified quarterly).

Q: How do you communicate LLM incidents to affected users and to the public status page in a way that is both honest and appropriately bounded?

Three communication surfaces operate at different cadences. The internal-leadership update during the incident: brief, factual, severity-classified, with the user impact estimate and the containment status; updated at defined intervals (typical: every 30 minutes during Sev1, every hour during Sev2); owned by the incident commander. The customer notification surface for incidents reaching defined user-impact thresholds: scoped to the affected feature and tenant set (avoiding panic across an unaffected user base), worded factually about what users may have experienced and what action is recommended, communicated through the standard customer-notification channels (in-app banner, email to affected administrators, status-page incident); the customer-success subject-matter rota and the legal subject-matter rota co-author for regulated-content incidents. The public status page for incidents affecting production-broad availability or quality: posted promptly when the impact is confirmed, updated as the situation evolves, resolved with a brief description of what happened and what the team is doing to prevent recurrence; the IC owns the posting cadence with engineering and customer success co-reviewing. The postmortem published externally where appropriate (in the spirit of the Google/Etsy public-postmortem tradition) follows the incident close: focuses on what happened systemically, what the team is doing about it, and what the broader industry can learn; avoids over-detailing the attack vector for security incidents in a way that would help future attackers; honest about the contributing causes and the team's response. The architectural deliverable is that the communication paths are pre-defined and pre-rehearsed; teams that improvise the communication during the first major incident produce communications that are either over-claimed (concealing extent) or panicked (overstating extent) and erode user trust either way.