Semantic product recommendations with SIMILAR_TO
Objective
Co-purchase recommendations are great for "people who bought X also bought Y" but fail on cold-
start: a brand-new SKU has no purchase history yet. Embedding-based similarity solves cold-start
by recommending products whose descriptions mean the same thing — even if no one has ever bought
them together. The wow moment: one Cypher pattern that reads (:Product {name: "..."})-[:SIMILAR_TO > 0.85]->(other)
returns the right answers, no separate vector-DB call.
Step 1: Set up a small catalog with embeddings
// Embedding values are 5-dim demo vectors; in production these come from EMBED('description').
MERGE (h1:Product {sku: "USB-301", name: "USB-C 4-port hub",
description: "Compact aluminum USB-C hub with 4 data ports for laptops",
price: 39.99, embedding: [0.81, 0.04, 0.12, -0.05, 0.21]})
MERGE (h2:Product {sku: "USB-307", name: "USB-C charging hub",
description: "USB-C hub with 3 data ports plus 65W passthrough charging",
price: 54.99, embedding: [0.79, 0.06, 0.10, -0.04, 0.23]})
MERGE (h3:Product {sku: "USB-410", name: "Thunderbolt 4 dock",
description: "Thunderbolt 4 docking station with dual 4K display output",
price: 249.0, embedding: [0.77, 0.02, 0.15, -0.07, 0.25]})
MERGE (k1:Product {sku: "KBD-100", name: "Mechanical keyboard 75%",
description: "Hot-swap mechanical keyboard, 75% layout, RGB backlight",
price: 129.0, embedding: [-0.34, 0.58, 0.21, 0.04, 0.12]})
MERGE (k2:Product {sku: "KBD-105", name: "Wireless mechanical keyboard",
description: "Bluetooth mechanical keyboard with hot-swap switches",
price: 159.0, embedding: [-0.32, 0.60, 0.19, 0.06, 0.11]})
MERGE (m1:Product {sku: "MIC-220", name: "USB condenser microphone",
description: "Cardioid USB condenser mic for podcasting and streaming",
price: 99.0, embedding: [0.07, -0.41, 0.66, 0.10, -0.18]})
MERGE (m2:Product {sku: "MIC-225", name: "XLR studio microphone",
description: "Large-diaphragm XLR studio condenser microphone",
price: 219.0, embedding: [0.09, -0.43, 0.64, 0.08, -0.16]})
MERGE (b1:Product {sku: "BAG-501", name: "Tech messenger bag",
description: "Padded laptop messenger bag with cable organizer",
price: 79.0, embedding: [-0.10, 0.18, -0.55, 0.42, 0.30]});
Step 2: Recommend semantically similar products
// "A shopper viewed the Mechanical keyboard 75%. What else would they like?"
MATCH (seed:Product {sku: "KBD-100"})-[:SIMILAR_TO > 0.85]->(rec:Product)
RETURN rec.sku AS recommend_sku,
rec.name AS recommend_name,
rec.price AS price;
What's happening
[:SIMILAR_TO > 0.85]is a synthetic Cypher edge backed by the HNSW vector index over theembeddingproperty. Cypher engines that do not have this need an out-of-band vector lookup followed by a second SQL/JOIN — two systems, two round trips.- The threshold operator (
>,>=,<,<=,=) lets you tune precision/recall directly in the query.> 0.85is a strict "near neighbor";> 0.7casts a wider net. - The seed node never appears in its own neighbor set — the engine respects the cycle-handling
rule (
SkipRevisits), so the keyboard does not recommend itself. - Cold-start friendly: the new SKU
KBD-105has zero purchase history but its embedding is close toKBD-100, so it is recommended on day one. - This is the same primitive that powers GraphRAG, semantic dedup, and look-alike audiences. Master one query, reuse the pattern across personalisation, search, and matching.
Try this next
// Mix structural and semantic: same category AND semantically close
MATCH (seed:Product {sku: "MIC-220"})-[:SIMILAR_TO > 0.8]->(rec:Product)
WHERE rec.price < seed.price * 1.5
RETURN rec.sku, rec.name, rec.price;
// Loose threshold returns more, near-duplicates first.
MATCH (seed:Product {sku: "USB-301"})-[:SIMILAR_TO > 0.7]->(rec:Product)
RETURN rec.sku, rec.name;
// "Find products NOT similar to anything else in catalog" — outliers worth featuring.
MATCH (p:Product)
OPTIONAL MATCH (p)-[:SIMILAR_TO > 0.8]->(near:Product)
WITH p, count(near) AS nearby
WHERE nearby = 0
RETURN p.sku, p.name AS unique_item;