The problem
Corporate RAG has a quiet failure mode: the retriever fetches a chunk the user was never cleared to read, and the model dutifully summarises the secret. Most systems "fix" this by scrubbing the answer afterwards — too late, the fact has already reached the model. Strata enforces clearance inside retrieval.
Approach & tradeoffs
A user's clearance is a ceiling: public < internal < confidential < restricted.
That ACL is applied as a filter inside the Qdrant vector search and the Neo4j
graph WHERE clause, before any context reaches the generator — so a
higher-clearance fact can't be retrieved in the first place, let alone leaked.
On top of that boundary sits the retrieval engineering:
- Hybrid retrieval — dense (BGE-M3) + BM25 sparse, RRF-fused in Qdrant — then graph expansion in Neo4j around the vector seeds, fusion, and BGE cross-encoder reranking.
- An agentic loop in LangGraph (rewrite → retrieve → generate → critic), bounded by both an iteration cap and a hard wall-clock budget, with a graceful skip-rewrite mitigation on overrun.
- Fully local / OSS — Qwen3, BGE-M3, BGE reranker, Qdrant, Neo4j. No cloud model API.
Results
- ACL-safety: 100% across denial cases — the secret is filtered in Qdrant and Neo4j, so it never reaches the model even when the agent spends its full retry budget.
- A measured retrieval failure, root-caused and fixed: dense-only search couldn't pin exact CVE IDs among 150 near-identical records (recall 0.79); a dense + BM25 hybrid took recall to 1.00.
- A strict LLM-judge added alongside the lenient substring metric dropped a saturated 100% to 71% — the discriminating number, and a lesson in metric design.
What I'd flag
The golden sets are small (10–20 items, single-run, on a local 8 GB GPU), so single-pass-vs-agent deltas are within noise — which is exactly why every eval report stores the raw answers. That's how I caught a refusal-matcher bug that was the metric's fault, not the model's.