RAG with Citations (grounded answers that cite sources)

Make an AI agent cite its sources — retrieve passages, generate an answer, and attach the source documents it used, all in one SQL database. Auditable RAG that works with Claude Code, OpenClaw, LangChain, or a voice agent.

All recipes· agents· 13 minutesintermediateen
Instance: localhost:8080

Opens your running SynapCores (RAG with Citations (grounded answers that cite sources) will be staged for a preview — nothing runs until you click Run). No instance yet? Install free in ~30s.

Share

Objective

An answer you can't trace is an answer you can't trust. For support bots, research agents, and anything regulated, the agent must say where an answer came from. Here you'll extend RAG so every answer ships with citations: retrieve the top passages, generate the answer from them, and return the exact sources used — numbered so the answer can reference them. All in one database, no citation framework. The same pipeline works from any framework or a voice agent — see Use it from your agent at the end.

Step 1: Create the cited knowledge base

Each chunk carries a stable doc_ref (the citation handle) alongside its text and embedding.

CREATE TABLE IF NOT EXISTS recipe_rag_cited (
  chunk_id   INTEGER PRIMARY KEY,
  doc_ref    TEXT,                                   -- citation handle, e.g. 'SecurityPolicy §4.2'
  content    TEXT,
  embedding  VECTOR(384)
);

Step 2: Load documents with citation handles

Policy passages with the references a compliance-minded agent must cite.

INSERT INTO recipe_rag_cited (chunk_id, doc_ref, content) VALUES
 (1,'SecurityPolicy §2.1','All customer data is encrypted at rest using AES-256.'),
 (2,'SecurityPolicy §4.2','Access to production databases requires multi-factor authentication and is logged.'),
 (3,'Privacy §3.0','Personal data is retained for 24 months after account closure, then permanently deleted.'),
 (4,'SLA §1.4','The service guarantees 99.9% monthly uptime, with credits for any breach.'),
 (5,'Privacy §5.1','Customers may request a full export or deletion of their data within 30 days.'),
 (6,'SecurityPolicy §6.0','Third-party vendors are reviewed annually and must sign a data-processing agreement.');

Step 3: Embed the knowledge base

The embedding model runs in-database — your citation-aware retrieval index in one line.

UPDATE recipe_rag_cited SET embedding = EMBED(content);

Step 4: Retrieve the top passages with their citation handles

Retrieval returns both the text and the doc_ref, so the citation travels with the evidence. We project the similarity as relevance and order by that alias.

SELECT chunk_id, doc_ref, content,
       COSINE_SIMILARITY(embedding, EMBED('How is customer data protected and who can access it?')) AS relevance
FROM recipe_rag_cited
ORDER BY relevance DESC
LIMIT 3;

Step 5: Materialize the top sources, then generate a cited answer

First freeze the top-3 retrieved passages — with their citation handles — into a small table. Then concatenate them into one numbered context string and feed that to GENERATE(), telling it to cite each source name inline.

CREATE TABLE IF NOT EXISTS recipe_rag_cited_top (
  chunk_id  INTEGER PRIMARY KEY,
  doc_ref   TEXT,
  content   TEXT,
  relevance DOUBLE
);
INSERT INTO recipe_rag_cited_top (chunk_id, doc_ref, content, relevance)
SELECT chunk_id, doc_ref, content,
       COSINE_SIMILARITY(embedding, EMBED('How is customer data protected and who can access it?')) AS relevance
FROM recipe_rag_cited
ORDER BY relevance DESC
LIMIT 3;
CREATE TABLE IF NOT EXISTS recipe_rag_cited_ctx (id INTEGER PRIMARY KEY, sources TEXT);
INSERT INTO recipe_rag_cited_ctx (id, sources)
SELECT 1, GROUP_CONCAT('(' || doc_ref || ') ' || content, ' ') FROM recipe_rag_cited_top;
SELECT GENERATE(
  'Answer the question using ONLY these sources, and cite the source name in parentheses inline, e.g. (SecurityPolicy §4.2). Sources: ' ||
  sources ||
  ' Question: How is customer data protected and who can access it? Answer with inline citations:') AS cited_answer
FROM recipe_rag_cited_ctx;

Step 6: Return the citation list alongside the answer

Produce the source list your UI shows under the answer — a complete, auditable trail, ordered by relevance.

SELECT doc_ref AS citation, content, relevance
FROM recipe_rag_cited_top
ORDER BY relevance DESC;

Cleanup (Optional)

DROP TABLE IF EXISTS recipe_rag_cited;
DROP TABLE IF EXISTS recipe_rag_cited_top;
DROP TABLE IF EXISTS recipe_rag_cited_ctx;

Expected Outcomes

  • Step 4 retrieves the encryption and access-control passages, each carrying its doc_ref.
  • Step 5 produces an answer with inline [1]/[2] citations pointing at the right SecurityPolicy sections.
  • Step 6 returns the matching numbered citation list (handle + text) — the audit trail your UI renders under the answer.

You now have RAG that doesn't just answer — it shows its work, with citations traceable back to your source documents.

Use it from your agent (framework-agnostic — this is the whole point)

Cited RAG is just a doc table + a numbered retrieve + generate, so any agent uses it with no framework lock-in:

  • REST / SDKPOST /v1/query/execute (any language), or @synapcores/sdk client.executeQuery(...). Your agent runs the Step-5 query for the cited answer and the Step-6 query for the source list, then renders both.
  • MCP (native, on by default) — point any MCP client (Claude Code, Cursor, a custom loop, a voice runtime) at ws://<your-instance>/mcp?token=<jwt> (JWT from one POST /v1/auth/loginaccess_token). The query tool returns the answer and its numbered citations together — auditable RAG as a tool call.
  • Any framework — OpenClaw, LangChain / LlamaIndex citation chains, a custom loop, or a voice agent (which reads the answer and offers "I can send you the sources") all use the same cited store. The database is the brain; the framework is swappable.

Key Concepts Learned

  • Carry a doc_ref on every chunk so the citation travels with the evidence through retrieval.
  • ROW_NUMBER() over the ranked passages gives stable [n] handles for inline citation.
  • Returning the answer and the numbered source list together makes RAG auditable, not just fluent.
  • Because it's plain data ops (SQL / REST / MCP), cited RAG works for any agent — the agent-agnostic backend pattern this cluster builds on.

Tags

ai-agentragcitationsretrieval-augmented-generationvectorembeddingsmcp

Run this on your own machine

Install SynapCores Community Edition free, paste the SQL or Cypher above into the bundled web UI, and watch it run.

Download Free CE