GraphRAG, hybrid vector + graph multi-hop QA

Objective

Vector-only RAG retrieves passages by similarity but loses the relationships between them — ask "which products from suppliers we audited last year had quality complaints?" and pure vector search gets close but cannot enforce the audit-year filter or the supplier link. GraphRAG mixes semantic recall (SIMILAR_TO) with structural patterns (-[:SUPPLIES]->) in one Cypher query, and uses LLM_SCORE to grade answer relevance inline. The wow moment: a single MATCH retrieves semantically similar complaints, walks the supplier edge, filters by audit year, and ranks by LLM-judged severity — all in one round trip.

Step 1: Set up the catalog graph

MERGE (cs1:Supplier {id: "SUP-201", name: "Pacifica Components", audited_year: 2025, region: "APAC"})
MERGE (cs2:Supplier {id: "SUP-202", name: "Andes Forge",         audited_year: 2024, region: "LATAM"})
MERGE (cs3:Supplier {id: "SUP-203", name: "Nordica Steel",       audited_year: 2025, region: "EMEA"})
MERGE (cs4:Supplier {id: "SUP-204", name: "Rivera Plastics",     audited_year: 2023, region: "LATAM"})

MERGE (p1:Product {sku: "PRD-7700", name: "Quad-port USB-C hub",     embedding: [0.11, 0.32, -0.04, 0.27, 0.18]})
MERGE (p2:Product {sku: "PRD-7702", name: "USB-C charging dock",     embedding: [0.13, 0.30, -0.02, 0.25, 0.20]})
MERGE (p3:Product {sku: "PRD-7800", name: "Bluetooth mechanical kbd", embedding: [-0.22, 0.05, 0.41, -0.11, 0.07]})
MERGE (p4:Product {sku: "PRD-7910", name: "4K display panel",        embedding: [0.31, -0.14, 0.22, 0.09, -0.30]})
MERGE (p5:Product {sku: "PRD-8001", name: "Wireless ergo mouse",     embedding: [-0.19, 0.07, 0.38, -0.09, 0.04]})

MERGE (cs1)-[:SUPPLIES]->(p1)
MERGE (cs1)-[:SUPPLIES]->(p2)
MERGE (cs2)-[:SUPPLIES]->(p3)
MERGE (cs3)-[:SUPPLIES]->(p4)
MERGE (cs4)-[:SUPPLIES]->(p5)

MERGE (c1:Complaint {id: "CX-9001", text: "Hub stops charging after 30 minutes of heavy load",
                     opened: "2026-02-14", embedding: [0.12, 0.31, -0.03, 0.26, 0.19]})
MERGE (c2:Complaint {id: "CX-9014", text: "Dock disconnects displays randomly under warm conditions",
                     opened: "2026-03-02", embedding: [0.14, 0.29, -0.01, 0.24, 0.21]})
MERGE (c3:Complaint {id: "CX-9100", text: "Keyboard keys stick after a week",
                     opened: "2026-03-08", embedding: [-0.21, 0.06, 0.40, -0.10, 0.06]})

MERGE (c1)-[:ABOUT]->(p1)
MERGE (c2)-[:ABOUT]->(p2)
MERGE (c3)-[:ABOUT]->(p3);

Step 2: Hybrid retrieval, walk semantic neighbors then enforce structure

// "Find quality issues semantically similar to a thermal-failure complaint,
//  scoped to suppliers we audited in 2025, ranked by LLM-judged severity."
MATCH (seed:Complaint {id: "CX-9001"})-[:SIMILAR_TO > 0.85]->(c:Complaint)
MATCH (c)-[:ABOUT]->(p:Product)<-[:SUPPLIES]-(s:Supplier)
WHERE s.audited_year = 2025
WITH s, p, c,
     llm_score("rate severity of this product complaint from 0 (cosmetic) to 1 (safety recall)", c) AS severity
WHERE severity > 0.5
RETURN s.name      AS supplier,
       p.name      AS product,
       c.text      AS complaint,
       severity
ORDER BY severity DESC;

What's happening

The [:SIMILAR_TO > 0.85] hop runs an HNSW lookup against the embedding property and yields semantic neighbors of the seed complaint — not a separate vector-DB call, an inline Cypher edge.
The next hop walks the structural -[:ABOUT]->- and <-[:SUPPLIES]- edges to filter by supplier year. Vector-only RAG cannot enforce this without a second-stage filter.
llm_score(prompt, c) runs the LLM against each surviving complaint as a scalar — the WHERE clause then drops anything below 0.5. Severity ranking happens inside the engine, not in app code.
A traditional stack would need: vector DB lookup → fetch IDs → SQL JOIN to suppliers → app-side LLM call → re-rank. That's four round trips and three systems; here it is one query.
Swap the seed complaint or the audit-year filter to instantly get a different question's answer.

Try this next

MATCH (seed:Complaint {id: "CX-9001"})-[:SIMILAR_TO > 0.7]->(c:Complaint)-[:ABOUT]->(p:Product)
RETURN seed.text AS asked_about, c.text AS related, p.name AS product;

MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)<-[:ABOUT]-(c:Complaint)
WITH s, count(c) AS complaint_count,
     llm_score("rate this supplier's quality risk from 0 to 1", s) AS supplier_risk
RETURN s.name, complaint_count, supplier_risk
ORDER BY supplier_risk DESC;

MATCH (s:Supplier {region: "LATAM"})-[:SUPPLIES]->(p:Product)
OPTIONAL MATCH (p)<-[:ABOUT]-(c:Complaint)
RETURN s.name, p.name, count(c) AS complaints;

GraphRAG, hybrid vector + graph multi-hop QA

GraphRAG, hybrid vector + graph multi-hop QA

Objective

Step 1: Set up the catalog graph

Step 2: Hybrid retrieval, walk semantic neighbors then enforce structure

What's happening

Try this next

Run this on your own machine