GraphRAG, hybrid vector + graph multi-hop QA
Objective
Vector-only RAG retrieves passages by similarity but loses the relationships between them — ask
"which products from suppliers we audited last year had quality complaints?" and pure vector
search gets close but cannot enforce the audit-year filter or the supplier link. GraphRAG mixes
semantic recall (SIMILAR_TO) with structural patterns (-[:SUPPLIES]->) in one Cypher query,
and uses LLM_SCORE to grade answer relevance inline. The wow moment: a single MATCH retrieves
semantically similar complaints, walks the supplier edge, filters by audit year, and ranks by
LLM-judged severity — all in one round trip.
Step 1: Set up the catalog graph
MERGE (cs1:Supplier {id: "SUP-201", name: "Pacifica Components", audited_year: 2025, region: "APAC"})
MERGE (cs2:Supplier {id: "SUP-202", name: "Andes Forge", audited_year: 2024, region: "LATAM"})
MERGE (cs3:Supplier {id: "SUP-203", name: "Nordica Steel", audited_year: 2025, region: "EMEA"})
MERGE (cs4:Supplier {id: "SUP-204", name: "Rivera Plastics", audited_year: 2023, region: "LATAM"})
MERGE (p1:Product {sku: "PRD-7700", name: "Quad-port USB-C hub", embedding: [0.11, 0.32, -0.04, 0.27, 0.18]})
MERGE (p2:Product {sku: "PRD-7702", name: "USB-C charging dock", embedding: [0.13, 0.30, -0.02, 0.25, 0.20]})
MERGE (p3:Product {sku: "PRD-7800", name: "Bluetooth mechanical kbd", embedding: [-0.22, 0.05, 0.41, -0.11, 0.07]})
MERGE (p4:Product {sku: "PRD-7910", name: "4K display panel", embedding: [0.31, -0.14, 0.22, 0.09, -0.30]})
MERGE (p5:Product {sku: "PRD-8001", name: "Wireless ergo mouse", embedding: [-0.19, 0.07, 0.38, -0.09, 0.04]})
MERGE (cs1)-[:SUPPLIES]->(p1)
MERGE (cs1)-[:SUPPLIES]->(p2)
MERGE (cs2)-[:SUPPLIES]->(p3)
MERGE (cs3)-[:SUPPLIES]->(p4)
MERGE (cs4)-[:SUPPLIES]->(p5)
MERGE (c1:Complaint {id: "CX-9001", text: "Hub stops charging after 30 minutes of heavy load",
opened: "2026-02-14", embedding: [0.12, 0.31, -0.03, 0.26, 0.19]})
MERGE (c2:Complaint {id: "CX-9014", text: "Dock disconnects displays randomly under warm conditions",
opened: "2026-03-02", embedding: [0.14, 0.29, -0.01, 0.24, 0.21]})
MERGE (c3:Complaint {id: "CX-9100", text: "Keyboard keys stick after a week",
opened: "2026-03-08", embedding: [-0.21, 0.06, 0.40, -0.10, 0.06]})
MERGE (c1)-[:ABOUT]->(p1)
MERGE (c2)-[:ABOUT]->(p2)
MERGE (c3)-[:ABOUT]->(p3);
Step 2: Hybrid retrieval, walk semantic neighbors then enforce structure
// "Find quality issues semantically similar to a thermal-failure complaint,
// scoped to suppliers we audited in 2025, ranked by LLM-judged severity."
MATCH (seed:Complaint {id: "CX-9001"})-[:SIMILAR_TO > 0.85]->(c:Complaint)
MATCH (c)-[:ABOUT]->(p:Product)<-[:SUPPLIES]-(s:Supplier)
WHERE s.audited_year = 2025
WITH s, p, c,
llm_score("rate severity of this product complaint from 0 (cosmetic) to 1 (safety recall)", c) AS severity
WHERE severity > 0.5
RETURN s.name AS supplier,
p.name AS product,
c.text AS complaint,
severity
ORDER BY severity DESC;
What's happening
- The
[:SIMILAR_TO > 0.85]hop runs an HNSW lookup against theembeddingproperty and yields semantic neighbors of the seed complaint — not a separate vector-DB call, an inline Cypher edge. - The next hop walks the structural
-[:ABOUT]->-and<-[:SUPPLIES]-edges to filter by supplier year. Vector-only RAG cannot enforce this without a second-stage filter. llm_score(prompt, c)runs the LLM against each surviving complaint as a scalar — the WHERE clause then drops anything below 0.5. Severity ranking happens inside the engine, not in app code.- A traditional stack would need: vector DB lookup → fetch IDs → SQL JOIN to suppliers → app-side LLM call → re-rank. That's four round trips and three systems; here it is one query.
- Swap the seed complaint or the audit-year filter to instantly get a different question's answer.
Try this next
MATCH (seed:Complaint {id: "CX-9001"})-[:SIMILAR_TO > 0.7]->(c:Complaint)-[:ABOUT]->(p:Product)
RETURN seed.text AS asked_about, c.text AS related, p.name AS product;
MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)<-[:ABOUT]-(c:Complaint)
WITH s, count(c) AS complaint_count,
llm_score("rate this supplier's quality risk from 0 to 1", s) AS supplier_risk
RETURN s.name, complaint_count, supplier_risk
ORDER BY supplier_risk DESC;
MATCH (s:Supplier {region: "LATAM"})-[:SUPPLIES]->(p:Product)
OPTIONAL MATCH (p)<-[:ABOUT]-(c:Complaint)
RETURN s.name, p.name, count(c) AS complaints;