Skip to content
Blog

Engineering-Insights

Tiefgehende Analysen zu KI-Systemen, Cloud-Architektur, verteilten Systemen und Engineering-Führung.

Hybrid Search and Re-ranking in Production RAG: BM25, Dense Vectors, Cross-encoders, and Everything In Between (2026)
ai-architecture1 min read

Hybrid Search and Re-ranking in Production RAG: BM25, Dense Vectors, Cross-encoders, and Everything In Between (2026)

The single biggest reason production RAG systems return confident wrong answers is not the LLM, the prompt, or the chunking — it is the retriever returning the wrong documents into the top-k. Dense-vector-only retrieval gives 70% recall on conceptual queries and 30% on exact-term queries — and a better embedding model does not fix it because the failure mode is structural. The architecture the field has converged on in 2026: sparse retriever (BM25 or SPLADE) + dense retriever (bi-encoder embeddings) running in parallel, fused via RRF or weighted-α, cross-encoder re-ranker over the top-50 candidates, MMR diversification, ACL/freshness pre-filter, query understanding in front. This article is the deep-dive on what each primitive is doing, why each fails, the latency budget, eight anti-patterns, and the five-stage maturity ladder from single-retriever to calibrated-fusion-with-online-feedback.

May 19, 2026Read
Modules vs Vertical Slices: Macro vs Micro Architecture in the Modular Monolith (2026)
microservices-patterns1 min read

Modules vs Vertical Slices: Macro vs Micro Architecture in the Modular Monolith (2026)

The argument "Clean Architecture vs Vertical Slice Architecture" is a category error — the two operate on different axes. A module is a macro-architectural decision about bounded contexts, public contracts, data ownership and communication style. A vertical slice is a micro-architectural decision about feature folder organisation inside a module. The killer property of a real modular monolith is that the two axes are independent: heterogeneous internals (Clean Architecture in one module, vertical slices in another, transaction scripts in a third) live safely behind homogeneous module boundaries enforced by project references, ArchUnit rules, and schema grants. This article is the technical deep-dive: the five enforceable module properties, the four slice properties, the cross-module communication spectrum from in-process method calls to outbox-backed event buses, the per-module internal-style decision matrix, multi-layered boundary enforcement, eight anti-patterns, and the five-stage maturity ladder from layered monolith to deliberate modular-monolith target state.

May 19, 2026Read
Agentic AI Debugging: When the Loop Doesn't Stop (2026)
ai-architecture1 min read

Agentic AI Debugging: When the Loop Doesn't Stop (2026)

The single most expensive failure mode of an agentic system is not the agent producing the wrong answer — it is the agent producing no answer while burning through tool calls, context, and provider budget in a tight loop the runtime did not detect. Six failure modes (infinite tool-call loop, plan-execute oscillation, sub-agent recursion, context thrash, hallucinated arguments, silent budget burn), six detection signals (step cap, semantic similarity, cost slope, identical call, delegation depth, context utilisation), five containment primitives (hard step-cap, budget kill-switch, tool-call dedupe, plan-diff guard, supervisor halt), a state machine with running/watching/throttled/halted, a seven-field RCA template, 8 anti-patterns, and a 5-stage maturity ladder. This is how runaway loops become bounded incidents.

May 18, 2026Read
Evaluation-Driven Development: Replacing TDD for LLM Systems (2026)
ai-architecture1 min read

Evaluation-Driven Development: Replacing TDD for LLM Systems (2026)

Test-driven development does not survive the transition to LLM systems — the assertion cannot be strict-equality, the correct output is a distribution, the red-green-refactor loop has no green, and the assertion itself is fallible. Evaluation-driven development is the discipline that replaces TDD: the same shape of "write the assertion before the implementation, ratchet it as the implementation improves, gate every change on the verdict", but with eval sets instead of unit tests, distribution verdicts instead of boolean pass-fail, calibrated LLM judges instead of strict equality, and a ratcheted baseline instead of a fixed expected output. This article is the methodology, the eval-set hygiene (golden, regression, adversarial, drift), the four eval layers (unit, scenario, shadow, canary), the LLM-as-judge calibration practice, the CI integration, 8 anti-patterns, and the 5-stage maturity ladder.

May 18, 2026Read
LLMjacking 2026: How Attackers Hijack Your Bedrock and OpenAI Quota — and the Seven-Layer Defence That Stops the $84,000 Weekend
ai-architecture1 min read

LLMjacking 2026: How Attackers Hijack Your Bedrock and OpenAI Quota — and the Seven-Layer Defence That Stops the $84,000 Weekend

A finance team walked into the office on a Monday morning in early 2025 and found an $84,000 invoice for the previous 48 hours. The application had not been defaced; no customer data had been exfiltrated; the dashboards were green. The bill was the breach. This is LLMjacking — the unauthorised hijack of cloud-hosted LLM resources for compute monetisation, the AI-security failure mode that does not look like a security incident until the invoice arrives. The seven-layer defence-in-depth stack is the architectural response: workload identity replacing static keys, hard quota at the gateway, model-level RBAC, network isolation, behavioural analytics, automated kill switch, and continuous credential hygiene. AWS-native reference architecture with Azure and GCP equivalents, attack-lifecycle map from initial access to weekend burn, eight anti-patterns retired, five-stage maturity ladder, and the Monday-morning 24h / 7d / 30d action checklist that materially reduces exposure by Friday.

May 16, 2026Read
AI Compliance Architecture: One Control Plane for EU AI Act, GDPR, DPDP, HIPAA, and APPI (2026)
ai-architecture1 min read

AI Compliance Architecture: One Control Plane for EU AI Act, GDPR, DPDP, HIPAA, and APPI (2026)

A reference control-plane architecture for AI systems that have to satisfy multiple regulatory regimes at once. Covers inventory, policy, release gates, runtime controls, and the evidence fabric that connects them.

May 15, 2026Read
Air-Gapped AI Architecture: Offline LLM Systems for Regulated and Classified Environments (2026)
ai-architecture1 min read

Air-Gapped AI Architecture: Offline LLM Systems for Regulated and Classified Environments (2026)

A reference architecture for offline LLM systems in air-gapped environments. Covers signed update flows, local registries, offline retrieval, observability, security controls, and the real cost profile of air-gapped AI.

May 15, 2026Read
Multi-Tenant RAG Isolation: The 7 Attack Vectors and the Architecture That Closes Them (2026)
ai-architecture1 min read

Multi-Tenant RAG Isolation: The 7 Attack Vectors and the Architecture That Closes Them (2026)

Multi-tenant RAG has a security model that does not exist in single-tenant RAG and is not covered by generic SaaS multi-tenant discipline. The 2024–2025 incident record now has enough cross-tenant RAG leakage cases to classify the failure modes, and the result is a seven-vector taxonomy: cross-tenant retrieval leakage, embedding-space collisions, metadata-filter bypass, shared-index poisoning, re-ranker leakage, eval-set contamination, response-cache cross-talk. This article is the seven vectors with their mechanism and architectural defence, the per-tenant namespace pattern that closes them at every data surface, the eight anti-patterns that produce the bad outcomes, and the maturity ladder from Stage 0 (single shared everything) to Stage 4 (continuously-validated isolation).

May 14, 2026Read
Cost Engineering for LLM Features: From $100k to $1M Monthly Spend (2026)
ai-architecture1 min read

Cost Engineering for LLM Features: From $100k to $1M Monthly Spend (2026)

The $100k to $1M monthly LLM-spend transition is the architecturally serious crossing in the life of an LLM product. The teams that handle it well treat cost as a first-class architectural property — instrumented, budgeted, gated, attributed, and tuned — and they build the five-layer stack of budget gate, semantic cache, dynamic router, prompt compactor, and inference layer with an attribution feedback loop wrapped around it. This article is the architecture, the order to build it in, the 10k-RPM unit-economics drill-down that produces a 64% reduction through composed savings, the unglamorous levers (prefill/decode separation, KV-cache reuse, speculative decoding, batch endpoints, output-length discipline), the spot/reserved/on-demand procurement mix, 8 anti-patterns that produce the bad spend curve, and the 5-stage maturity ladder.

May 14, 2026Read
Build a Multi-Agent AI System with LangGraph + MCP + A2A: Beginner-Friendly End-to-End Tutorial (2026)
ai-architecture1 min read

Build a Multi-Agent AI System with LangGraph + MCP + A2A: Beginner-Friendly End-to-End Tutorial (2026)

A full beginner-friendly walk-through of building a four-agent AI system on a laptop with no GPU and a free LLM. We use LangGraph for orchestration (state, nodes, edges, conditional edges, checkpointing, human-in-the-loop with interrupt), MCP for tool access (the official filesystem server via stdio), and A2A for cross-process agent calls (agent card at /.well-known/agent-card.json, JSON-RPC message lifecycle). The four agents form a Learning Accelerator — a Curriculum Planner, an Explainer that reads local notes via MCP, a Quiz Generator exposed as an A2A server, and a Progress Coach supervisor that orchestrates the rest with SQLite checkpointing. Provider switch covers Gemini 2.0 Flash (free, default), Groq (free, fast) and OpenAI (cents per run). Langfuse for traces, DeepEval for LLM-as-judge regression tests. Every file is shown in full inline; no companion repo needed.

May 13, 2026Read
Prompt Injection Defence in Depth (2026): Six Layers from Input Sanitisation to Output Firewall
ai-architecture1 min read

Prompt Injection Defence in Depth (2026): Six Layers from Input Sanitisation to Output Firewall

Prompt injection in 2026 is no longer a research curiosity; it is the day-one architectural assumption. The six-layer defence-in-depth stack is the engineering response: input sanitisation and normalisation, intent classifier and injection detector, prompt-template hardening with delimiters and role separation, tool-use authorisation policy outside the prompt, output classifier and secondary review LLM, output firewall for egress filtering and action-effect simulation. This article walks each layer with its threat model, engineering surface, and operational discipline; the build-order rationale; the composition with category-aware guardrails, agent circuit breakers, observability, and incident response. 8 anti-patterns retired, 5-stage maturity ladder, and the honest summary of where the field sits in early 2026.

May 13, 2026Read
Agritech AI Architecture: Pasture Vision, Livestock Behaviour Models, and Low-Bandwidth Edge (NZ Reference, 2026)
ai-architecture1 min read

Agritech AI Architecture: Pasture Vision, Livestock Behaviour Models, and Low-Bandwidth Edge (NZ Reference, 2026)

New Zealand agritech in 2026 lands the AI architecture conversation hardest on the constraints mainstream cloud-AI tutorials assume away: solar-powered devices on the cow's collar, intermittent cellular and satellite connectivity, the welfare envelope that takes precedence over production, and the data co-governance arrangement under the Algorithm Charter and Te Tiriti o Waitangi. This article walks the engineering deliverables for an agritech AI architecture in 2026: edge-first inference with welfare envelope on-device, multispectral pasture-vision with fixed-tower-drone-satellite fusion, behaviour-model training with the labelling discipline as the value-creating activity, store-and-forward synchronisation with explicit conflict resolution, federated learning across farms, Te Tiriti and Algorithm Charter compliance engineered into the architecture. NZ-anchored to Halter, Fonterra, Gallagher, LIC, AgResearch and globally portable. 8 anti-patterns, 5-stage maturity ladder.

May 13, 2026Read
Game AI Architecture: Procedural Quest Systems and LLM-Driven NPC Dialogue (Budget Models, 2026)
ai-architecture1 min read

Game AI Architecture: Procedural Quest Systems and LLM-Driven NPC Dialogue (Budget Models, 2026)

Game AI in 2026 collides hardest with frame-rate budgets, session-cost economics, and the modding community's ability to break any system without adversarial assumptions. This article walks the engineering deliverables for an LLM-driven game AI architecture in 2026: tier-routed inference (on-device 1-3B small model, edge 7-13B mid-size, cloud frontier) with budget-aware routing; state-machine-augmented dialogue with LLM-generated surface variation; procedural quest skeletons with LLM in-fill within writer-defined templates; multi-layer content-safety and prompt-injection defence; per-session cost budget as engineering discipline; semantic cache as first-class architectural element. PL-anchored to the Warsaw/Krakow game-dev cluster (CD Projekt Red, Techland, 11 bit, People Can Fly, Bloober Team) and globally portable. 8 anti-patterns, 5-stage maturity ladder.

May 13, 2026Read
Agentic AI for Mining and Resources: Shift Handover, Tool Use, and Fleet Coordination (2026)
ai-architecture1 min read

Agentic AI for Mining and Resources: Shift Handover, Tool Use, and Fleet Coordination (2026)

Mining and resources is a distinct architectural setting for agentic AI: bounded autonomy under safety envelopes, partial-disconnection resilience, regulator-ready operational record by construction. This article walks the engineering deliverables for a mining-and-resources agentic AI architecture in 2026: tool catalogue tiered by safety envelope (read-only / advisory / supervised actuation / autonomous actuation); shift-handover loop integrating voice, paper, and structured data into the agent's working memory; fleet-coordination layer reasoning about cycle time, mixed-autonomy interaction, and stop-condition response; safety envelope as an explicit, inspectable, enforced, and exercised artefact; regulatory composition with NSW/Queensland/WA mines safety regimes, the AISI Voluntary Standard, and the Critical Minerals Strategy. AU-anchored, globally portable to Chile/Canada/SA/Indonesia/Brazil mining and to construction/heavy-haulage/ports/rail. 8 anti-patterns, 5-stage maturity ladder, composition with AISI eval pipeline, agent memory, human escalation, incident response, and agent-level circuit breakers.

May 13, 2026Read
AI Incident Response Runbook: RCA for LLM Failures (2026)
ai-architecture1 min read

AI Incident Response Runbook: RCA for LLM Failures (2026)

LLM systems fail in ways the SRE runbook of the last decade does not anticipate. This article walks the engineering deliverables for an LLM-aware incident response architecture in 2026: severity classification adapted to LLM failure surfaces; detection signal stack (eval drift, guardrail trips, cost spikes, latency p99, hallucination rate, user reports); six containment primitives operable from a single console (model pin, prompt rollback, retrieval quarantine, canary halt, traffic shape, kill-switch); RCA template with LLM failure classes (hallucination, prompt injection, model regression, retrieval poisoning, vendor outage, jailbreak, context-window leak, agentic loop) and LLM-specific action item types; blameless culture extended to model contributions; on-call rota with primary, secondary, incident commander, and subject-matter dimensions. 8 anti-patterns, 5-stage maturity ladder, composition with AI observability, prompt versioning, human escalation, and AI-native CI/CD.

May 12, 2026Read
Agentic AI Meets Ringi: Decision Loop Architecture for Japanese Enterprise Approval Flows (2026)
ai-architecture1 min read

Agentic AI Meets Ringi: Decision Loop Architecture for Japanese Enterprise Approval Flows (2026)

The Japanese ringi seido is not a workflow to optimise around — it is the consensus-decision substrate on which the firm's J-SOX internal-control framework rests, and an agentic AI deployed without ringi-aware architecture fails the audit committee's review on first sample. This article walks the engineering deliverables: deterministic decision classifier reading the internal-control manifest; structured nemawashi surface for pre-circulation; agent-generated Japanese ringisho from internal-control-approved templates; electronic hanko circulation integrated with the firm's approval system; chain-of-approval execution gate; decision-narrative audit trail composed with J-SOX retention. 8 anti-patterns, 5-stage maturity ladder, portable to Korean chaebol gyeolje, Taiwanese family-business consensus, and broader multi-approver enterprise decision cultures.

May 12, 2026Read
Betriebsrat and AI Deployment: Co-Determination-Friendly Rollout Architecture (2026)
ai-architecture1 min read

Betriebsrat and AI Deployment: Co-Determination-Friendly Rollout Architecture (2026)

The German Betriebsrat is not a stakeholder you consult; it is a co-decision-maker under BetrVG §87(1) Nr. 6 whose statutory rights determine whether your AI deployment ships. This article walks the engineering deliverables for a Betriebsvereinbarung-friendly architecture — telemetry boundaries that distinguish system observability from employee surveillance at the substrate level, opt-out paths as first-class workflow, autonomy tiers (advisory/assist/act) enforced at code level, change-classification pipeline gating releases against Betriebsvereinbarung-impact tiers, German-language artefact pack. 8 anti-patterns, 5-stage maturity ladder, portable to French CSE, Austrian Betriebsrat, Dutch Ondernemingsraad, and the European Works Council framework.

May 12, 2026Read
Data Sovereignty Architecture: Respecting Māori Data Principles in Tikanga-Aware ML Systems (2026)
ai-architecture1 min read

Data Sovereignty Architecture: Respecting Māori Data Principles in Tikanga-Aware ML Systems (2026)

Māori data sovereignty in 2026 is an architecture problem disguised as a policy problem. This article walks the engineering deliverables that operationalise Te Mana Raraunga and the CARE principles — per-partnership cloud subscriptions, Iwi authority IdP integration, deletion machinery that reaches the embedding store and fine-tune layer, partnership-trained cultural sensitivity classifiers, audit-log self-service for the Iwi data board, and engagement cadence wired into CI release gates. 8 anti-patterns, 5-stage maturity ladder, portable to Australian First Nations, Canadian First Nations OCAP, Sami, and Pacific data sovereignty contexts.

May 12, 2026Read
AI Nearshoring Architecture: Poland as the EU AI Delivery Hub — Team Topology and Data Residency (2026)
ai-architecture1 min read

AI Nearshoring Architecture: Poland as the EU AI Delivery Hub — Team Topology and Data Residency (2026)

Poland in 2026 is the structurally interesting answer to the where-do-we-build-AI question — the cost gap, the EU AI Act compliance moat, and the local-LLM stack converged. This is the architecture that makes the answer credible: the team topology that puts the platform and EU-customer surface in Poland with full ownership, the two-data-plane partitioning that makes EU-residency enforceable, the dual GDPR + AI Act compliance posture that pre-empts buyer due-diligence, the IP boundary that survives the US-client security audit, and the 24-hour eval cycle that turns the time-zone gap into a velocity multiplier. 8 anti-patterns, 5-stage maturity ladder, portable to Romania, Portugal, Czech Republic.

May 11, 2026Read
Privacy-by-Design RAG Architecture for the Australian Privacy Act 2025 Reforms and the Statutory Tort (2026)
ai-architecture1 min read

Privacy-by-Design RAG Architecture for the Australian Privacy Act 2025 Reforms and the Statutory Tort (2026)

The 2024-2025 Privacy Act reforms changed RAG architecture in Australia from a "we should think about privacy" posture to a "design for the tort claim and the OAIC inquiry" posture. This is the architecture that survives both — the PII boundary drawn before the embedding store, subject-keyed vault, placeholders in embeddings and prompts, storage-layer scoping with empty defaults, per-query provenance into an admissible audit log, deletion as transactional fan-out with a certificate, zero-retention enforced at the model gateway, and an automated-decision register that drives the privacy policy. Statutory tort hook, 8 anti-patterns, 5-stage maturity ladder, portable to UK/EU GDPR, India DPDP, and Singapore PDPA.

May 11, 2026Read

Bleiben Sie einen Schritt voraus

Wöchentliche Tiefenanalysen zu KI-Systemen, Cloud-Architektur, verteilten Systemen und Engineering-Führung. Schließen Sie sich 5.000+ Ingenieuren an.