← Architecture

Reference Architecture · Secure RAG

Retrieval is an authorization boundary, not a search mechanism.

Retrieval-augmented generation is now the dominant pattern for giving foundation models access to enterprise data. It is also the dominant attack surface. This reference architecture treats every phase of the retrieval pipeline as an enforcement point: data does not enter the model context simply because it is semantically relevant. It must also be authorized, validated, and free of adversarial payloads.

Core Thesis

The model never sees data unless both the user and the agent are authorized for the specific purpose, context, and data classification, and the retrieved content has been validated as non-adversarial.

Five enforcement phases

Each phase is independent and additive. Skipping a phase does not break the others, but each missing phase is a gap an attacker can navigate to.

01

Pre-Retrieval Authorization

Decide before searching

Before any query reaches the retriever, the system evaluates whether the user, agent, purpose, and requested data class are allowed. This is the cheapest place to deny. It prevents the retriever from ever touching documents the user is not entitled to see, which avoids leaking existence-of-document signal and reduces the attack surface for model inversion via repeated probing.

Controls

  • Identity propagation (user and agent both)
  • Purpose declaration or inference
  • Data class allow-list per agent capability
  • Risk signals (device, location, session)
02

Metadata Filtering at Retrieval

Narrow the corpus before semantics

The retriever only searches across data that survives metadata filters and entitlement constraints. Filter dimensions: classification, owner, tenant, department, business domain, purpose tag, retention category, and per-document ACL. Semantic similarity is applied after the corpus has been narrowed, not before. Semantically relevant data is not necessarily authorized data.

Controls

  • Classification: public, internal, confidential, sensitive, regulated
  • Tenant and domain isolation
  • Per-document and per-row ACL
  • Purpose tags constrained to declared intent
03

Post-Retrieval Validation

Recheck before the model sees it

Returned chunks are revalidated before insertion into the model context window. This catches misclassified documents, recent re-classifications, and edge cases where metadata filters were too coarse. It also runs adversarial content checks (see Phase 4) to detect payloads that would hijack the model if injected as context.

Controls

  • Per-chunk classification recheck
  • PII and sensitive data detection
  • Adversarial pattern scan (instruction-like text in retrieved data)
  • Provenance and source authenticity check
04

Indirect Prompt Injection Defense

Treat retrieved content as untrusted input

Indirect prompt injection is the dominant 2026-era threat against RAG systems. Adversarial payloads embedded in retrieved documents, tool outputs, or web content can hijack the model when consumed as context. The user never types the attack; the model encounters it through the supply chain. Defense requires treating every retrieved chunk as untrusted, segregating it from system instructions, and detecting common payload patterns before context insertion.

Controls

  • Strict separation of system prompt and retrieved context
  • Instruction-pattern detection on retrieved chunks (e.g. 'ignore previous', role hijacking)
  • Source allow-listing for external content
  • Quarantine and human review for high-sensitivity retrieval paths
05

Output Review

Validate before release

The generated response is checked for sensitive data leakage, policy violations, external sharing restrictions, and indicators of successful injection (e.g. the model attempting to follow instructions from retrieved content). Output review closes the loop: even when earlier phases miss something, the response check provides a last line of defense before the user sees the result.

Controls

  • PII and classification redaction
  • External sharing policy enforcement
  • Injection success indicators (model behavior anomalies)
  • Audit log of full chain (query, retrieved IDs, decision, output)

What goes wrong

Five failure modes that determine whether a RAG system is safe to operate in an enterprise.

Authorized by similarity, not policy

Semantic search will surface relevant documents whether or not the user is allowed to see them. A naive RAG implementation defers authorization to the model, which has no way to enforce it. Treat retrieval as an authorization boundary upstream of the model.

Indirect prompt injection

Retrieved documents, tool outputs, and web content can carry adversarial payloads. The threat is supply-chain shaped: the attacker poisons a document that will plausibly be retrieved, then waits. Defense lives in retrieval boundaries and context isolation, not in model guardrails.

Cross-tenant or cross-domain leakage

Without strict tenant and domain filters at retrieval, a shared model can leak data across boundaries. The model has no inherent concept of tenant isolation; the retrieval layer must enforce it.

Sensitive data exfiltration via response

Large-volume searches, broad wildcard prompts, repeated denials, or unusual purpose claims can be reconnaissance for exfiltration. Output review and abuse monitoring catch what retrieval boundaries miss.

Stale entitlements

Documents reclassified after embedding generation can leak through vector search even when current ACLs would deny. Either re-embed on reclassification, or enforce post-retrieval rechecks against live ACLs.

Decision Function

Retrieve = f(user, agent, purpose, data class, source provenance, context risk, output policy)

Every input feeds the decision. The model is downstream of all seven; it is the consumer of the output, not the arbiter of access.

Mapping to governance

Secure RAG is not a standalone discipline. It sits inside the governance framework as one execution pattern.

Risk Tiering (Section 2)

RAG over sensitive or regulated data is rarely below Tier 3. The Data Sensitivity dimension drives most of the score.

Threat Modeling (Section 3)

Categories 9 and 10 (prompt injection and indirect prompt injection) apply directly. Treat them as required threats in any RAG threat model.

Monitoring (Section 5)

Query volume anomalies, repeated denials, broad-scope searches, and unexpected source domains all signal abuse against RAG.

Incident Response (Section 6)

Indirect prompt injection events route to the Adversarial Exploitation incident class. Retrieved content compromise routes to Data Integrity.