Objective

A research agent has to do more than fetch a passage: it must read a corpus, connect the facts into a knowledge graph, answer questions that span several documents — with citations — and synthesize a briefing. Pure vector RAG can't follow relationships across documents; a graph alone can't read prose. Here you'll build a research agent that does both on one database, composing this cluster's RAG, citation, and knowledge-graph blocks. The same brain drives any framework or a voice agent — see Use it from your agent at the end.

Step 1: Ingest the document corpus (for RAG, with citations)

Each chunk carries a citation handle and an embedding so answers stay traceable.

CREATE TABLE IF NOT EXISTS recipe_research_docs (
  chunk_id  INTEGER PRIMARY KEY,
  doc_ref   TEXT,
  content   TEXT,
  embedding VECTOR(384)
);
INSERT INTO recipe_research_docs (chunk_id, doc_ref, content) VALUES
 (1,'PaperA §1','GraphRAG combines vector retrieval with graph traversal to answer multi-hop questions.'),
 (2,'PaperA §3','In GraphRAG, embeddings select an entry node and edges supply the connected context.'),
 (3,'PaperB §2','Citation-grounded answers reduce hallucination by tying each claim to a source.'),
 (4,'PaperB §4','Self-checking passes verify a draft answer against its retrieved sources.'),
 (5,'PaperC §1','Knowledge graphs encode entities and relations that flat text cannot express.'),
 (6,'PaperC §5','Multi-hop reasoning over a graph answers questions no single passage contains.');
UPDATE recipe_research_docs SET embedding = EMBED(content);

Step 2: Retrieve the most relevant passages for a query

The research agent pulls the top sources by meaning for the question it's investigating.

SELECT doc_ref, content,
       COSINE_SIMILARITY(embedding, EMBED('how do graphs help answer multi-hop questions?')) AS relevance
FROM recipe_research_docs
ORDER BY relevance DESC
LIMIT 3;

Step 3: Answer with inline citations

Generate an answer that cites the sources it used — auditable research output. First freeze the top-3 retrieved passages with their citation handles, concatenate them into one numbered context string, then generate the cited answer from it.

CREATE TABLE IF NOT EXISTS recipe_research_top (
  chunk_id  INTEGER PRIMARY KEY,
  doc_ref   TEXT,
  content   TEXT,
  relevance DOUBLE
);
INSERT INTO recipe_research_top (chunk_id, doc_ref, content, relevance)
SELECT chunk_id, doc_ref, content,
       COSINE_SIMILARITY(embedding, EMBED('how do graphs help answer multi-hop questions?')) AS relevance
FROM recipe_research_docs
ORDER BY relevance DESC
LIMIT 3;

CREATE TABLE IF NOT EXISTS recipe_research_ctx (id INTEGER PRIMARY KEY, sources TEXT);
INSERT INTO recipe_research_ctx (id, sources)
SELECT 1, GROUP_CONCAT('(' || doc_ref || ') ' || content, ' ') FROM recipe_research_top;

SELECT GENERATE(
  'Answer using ONLY these sources and cite the source name in parentheses inline, e.g. (PaperA §1). Sources: ' ||
  sources ||
  ' Question: How do knowledge graphs help answer multi-hop questions? Answer with citations:') AS cited_answer
FROM recipe_research_ctx;

Step 4: Build the concept knowledge graph (Cypher)

Connect the concepts the corpus discusses so the agent can reason across documents.

MERGE (rag:Concept {name: 'RAG'})
MERGE (graphrag:Concept {name: 'GraphRAG'})
MERGE (kg:Concept {name: 'KnowledgeGraph'})
MERGE (multihop:Concept {name: 'MultiHopReasoning'})
MERGE (cite:Concept {name: 'CitationGrounding'})
MERGE (graphrag)-[:EXTENDS]->(rag)
MERGE (graphrag)-[:USES]->(kg)
MERGE (kg)-[:ENABLES]->(multihop)
MERGE (cite)-[:IMPROVES]->(rag);

Step 5: Multi-hop query — what does GraphRAG ultimately enable?

Walk USES → ENABLES to connect GraphRAG to multi-hop reasoning through the knowledge-graph concept.

MATCH (g:Concept {name: 'GraphRAG'})-[:USES]->(:Concept)-[:ENABLES]->(cap:Concept)
RETURN g.name AS technique, cap.name AS enables;

Step 6: Find every concept that improves the base technique

A relationship query the prose never states in one place.

MATCH (c:Concept)-[:IMPROVES|EXTENDS]->(base:Concept {name: 'RAG'})
RETURN c.name AS related_concept;

Step 7: Synthesize a research briefing (GENERATE)

Combine the whole corpus into a short, structured briefing the agent hands back. Flatten the notes into one context row first, then generate from it.

CREATE TABLE IF NOT EXISTS recipe_research_corpus (id INTEGER PRIMARY KEY, notes TEXT);
INSERT INTO recipe_research_corpus (id, notes)
SELECT 1, GROUP_CONCAT(doc_ref || ': ' || content, ' ') FROM recipe_research_docs;

SELECT GENERATE(
  'Write a 3-bullet research briefing on graph-augmented retrieval, based only on these notes: ' || notes) AS briefing
FROM recipe_research_corpus;

Cleanup (Optional)

DROP TABLE IF EXISTS recipe_research_docs;
DROP TABLE IF EXISTS recipe_research_top;
DROP TABLE IF EXISTS recipe_research_ctx;
DROP TABLE IF EXISTS recipe_research_corpus;

MATCH (n:Concept) DETACH DELETE n;

Expected Outcomes

Step 2 retrieves the graph/multi-hop passages by meaning.
Step 3 returns an answer with inline [1]/[2] citations pointing at the right papers.
Step 5 connects GraphRAG → KnowledgeGraph → MultiHopReasoning across two hops.
Step 6 finds both CitationGrounding (improves) and GraphRAG (extends) as related to RAG — a cross-document relationship.
Step 7 produces a 3-bullet briefing synthesized from the whole corpus.

You've built a research agent that reads, connects, answers with citations, and synthesizes — RAG and a knowledge graph in one database.

Use it from your agent (framework-agnostic — this is the whole point)

The research brain is just a cited doc index + a concept graph, so any agent shell drives it with no framework lock-in:

REST / SDK — POST /v1/query/execute (any language), or @synapcores/sdk client.executeQuery(...). Your agent ingests sources, retrieves + cites (Steps 2–3), walks the concept graph (Steps 5–6), and synthesizes the briefing (Step 7).
MCP (native, on by default) — point any MCP client (Claude Code, Cursor, a custom loop, a voice runtime) at ws://<your-instance>/mcp?token=<jwt> (JWT from one POST /v1/auth/login → access_token). The query tool retrieves/cites/synthesizes; the execute tool runs Cypher for the concept graph — the research loop as tool calls.
Any framework — OpenClaw, LangChain / LlamaIndex research pipelines, a custom loop, or a voice research assistant that reads the briefing aloud all call the same brain. The database is the brain; the framework is swappable.

Key Concepts Learned

A research agent composes cited RAG (read + answer + trace) with a knowledge graph (connect + reason).
Vector retrieval answers "what does a passage say"; the graph answers "how do the facts relate."
One GENERATE() over the corpus produces a synthesized briefing from everything retrieved.
Because it's plain data ops (SQL + Cypher + GENERATE / REST / MCP), the research agent works from any framework — the agent-agnostic backend pattern this cluster builds on.

Build a Research / Knowledge Agent (RAG + KG)