Reference Architecture · Secure RAG
Retrieval is an authorization boundary, not a search mechanism.
Retrieval-augmented generation is now the dominant pattern for giving foundation models access to enterprise data. It is also the dominant attack surface. This reference architecture treats every phase of the retrieval pipeline as an enforcement point: data does not enter the model context simply because it is semantically relevant. It must also be authorized, validated, and free of adversarial payloads.
Core Thesis
The model never sees data unless both the user and the agent are authorized for the specific purpose, context, and data classification, and the retrieved content has been validated as non-adversarial.
Five enforcement phases
Each phase is independent and additive. Skipping a phase does not break the others, but each missing phase is a gap an attacker can navigate to.
Pre-Retrieval Authorization
Decide before searchingBefore any query reaches the retriever, the system evaluates whether the user, agent, purpose, and requested data class are allowed. This is the cheapest place to deny. It prevents the retriever from ever touching documents the user is not entitled to see, which avoids leaking existence-of-document signal and reduces the attack surface for model inversion via repeated probing.
Controls
- Identity propagation (user and agent both)
- Purpose declaration or inference
- Data class allow-list per agent capability
- Risk signals (device, location, session)
Metadata Filtering at Retrieval
Narrow the corpus before semanticsThe retriever only searches across data that survives metadata filters and entitlement constraints. Filter dimensions: classification, owner, tenant, department, business domain, purpose tag, retention category, and per-document ACL. Semantic similarity is applied after the corpus has been narrowed, not before. Semantically relevant data is not necessarily authorized data.
Controls
- Classification: public, internal, confidential, sensitive, regulated
- Tenant and domain isolation
- Per-document and per-row ACL
- Purpose tags constrained to declared intent
Post-Retrieval Validation
Recheck before the model sees itReturned chunks are revalidated before insertion into the model context window. This catches misclassified documents, recent re-classifications, and edge cases where metadata filters were too coarse. It also runs adversarial content checks (see Phase 4) to detect payloads that would hijack the model if injected as context.
Controls
- Per-chunk classification recheck
- PII and sensitive data detection
- Adversarial pattern scan (instruction-like text in retrieved data)
- Provenance and source authenticity check
Indirect Prompt Injection Defense
Treat retrieved content as untrusted inputIndirect prompt injection is the dominant 2026-era threat against RAG systems. Adversarial payloads embedded in retrieved documents, tool outputs, or web content can hijack the model when consumed as context. The user never types the attack; the model encounters it through the supply chain. Defense requires treating every retrieved chunk as untrusted, segregating it from system instructions, and detecting common payload patterns before context insertion.
Controls
- Strict separation of system prompt and retrieved context
- Instruction-pattern detection on retrieved chunks (e.g. 'ignore previous', role hijacking)
- Source allow-listing for external content
- Quarantine and human review for high-sensitivity retrieval paths
Output Review
Validate before releaseThe generated response is checked for sensitive data leakage, policy violations, external sharing restrictions, and indicators of successful injection (e.g. the model attempting to follow instructions from retrieved content). Output review closes the loop: even when earlier phases miss something, the response check provides a last line of defense before the user sees the result.
Controls
- PII and classification redaction
- External sharing policy enforcement
- Injection success indicators (model behavior anomalies)
- Audit log of full chain (query, retrieved IDs, decision, output)
What goes wrong
Five failure modes that determine whether a RAG system is safe to operate in an enterprise.
Authorized by similarity, not policy
Semantic search will surface relevant documents whether or not the user is allowed to see them. A naive RAG implementation defers authorization to the model, which has no way to enforce it. Treat retrieval as an authorization boundary upstream of the model.
Indirect prompt injection
Retrieved documents, tool outputs, and web content can carry adversarial payloads. The threat is supply-chain shaped: the attacker poisons a document that will plausibly be retrieved, then waits. Defense lives in retrieval boundaries and context isolation, not in model guardrails.
Cross-tenant or cross-domain leakage
Without strict tenant and domain filters at retrieval, a shared model can leak data across boundaries. The model has no inherent concept of tenant isolation; the retrieval layer must enforce it.
Sensitive data exfiltration via response
Large-volume searches, broad wildcard prompts, repeated denials, or unusual purpose claims can be reconnaissance for exfiltration. Output review and abuse monitoring catch what retrieval boundaries miss.
Stale entitlements
Documents reclassified after embedding generation can leak through vector search even when current ACLs would deny. Either re-embed on reclassification, or enforce post-retrieval rechecks against live ACLs.
Decision Function
Retrieve = f(user, agent, purpose, data class, source provenance, context risk, output policy)
Every input feeds the decision. The model is downstream of all seven; it is the consumer of the output, not the arbiter of access.
Mapping to governance
Secure RAG is not a standalone discipline. It sits inside the governance framework as one execution pattern.
Risk Tiering (Section 2)
RAG over sensitive or regulated data is rarely below Tier 3. The Data Sensitivity dimension drives most of the score.
Threat Modeling (Section 3)
Categories 9 and 10 (prompt injection and indirect prompt injection) apply directly. Treat them as required threats in any RAG threat model.
Monitoring (Section 5)
Query volume anomalies, repeated denials, broad-scope searches, and unexpected source domains all signal abuse against RAG.
Incident Response (Section 6)
Indirect prompt injection events route to the Adversarial Exploitation incident class. Retrieved content compromise routes to Data Integrity.