Clearance-aware GraphRAG — Lucas González Fiz

The problem

Corporate RAG has a quiet failure mode: the retriever fetches a chunk the user was never cleared to read, and the model dutifully summarises the secret. Most systems "fix" this by scrubbing the answer afterwards — too late, the fact has already reached the model. Strata enforces clearance inside retrieval.

Approach & tradeoffs

A user's clearance is a ceiling: public < internal < confidential < restricted. That ACL is applied as a filter inside the Qdrant vector search and the Neo4j graph WHERE clause, before any context reaches the generator — so a higher-clearance fact can't be retrieved in the first place, let alone leaked.

On top of that boundary sits the retrieval engineering:

Hybrid retrieval — dense (BGE-M3) + BM25 sparse, RRF-fused in Qdrant — then graph expansion in Neo4j around the vector seeds, fusion, and BGE cross-encoder reranking.
An agentic loop in LangGraph (rewrite → retrieve → generate → critic), bounded by both an iteration cap and a hard wall-clock budget, with a graceful skip-rewrite mitigation on overrun.
Fully local / OSS — Qwen3, BGE-M3, BGE reranker, Qdrant, Neo4j. No cloud model API.

Results

ACL-safety: 100% across denial cases — the secret is filtered in Qdrant and Neo4j, so it never reaches the model even when the agent spends its full retry budget.
A measured retrieval failure, root-caused and fixed: dense-only search couldn't pin exact CVE IDs among 150 near-identical records (recall 0.79); a dense + BM25 hybrid took recall to 1.00.
A strict LLM-judge added alongside the lenient substring metric dropped a saturated 100% to 71% — the discriminating number, and a lesson in metric design.

What I'd flag

The golden sets are small (10–20 items, single-run, on a local 8 GB GPU), so single-pass-vs-agent deltas are within noise — which is exactly why every eval report stores the raw answers. That's how I caught a refusal-matcher bug that was the metric's fault, not the model's.