Find similar patients by symptom embedding
Objective
Clinicians ask "what worked for patients like this one?" all the time. ICD-10 codes match too
coarsely (every diabetic looks the same) and the free-text presentation is what carries the
actual signal. With a symptom_embedding on each patient node, SIMILAR_TO returns a cohort
of clinically similar patients and Cypher walks to their treatments and outcomes. The wow
moment: a single MATCH returns "patients with similar presentations and what they responded to."
Step 1: Set up patients, treatments, outcomes
MERGE (p1:Patient {mrn: "MRN-201", age: 67, sex: "F",
presentation: "fatigue, polyuria, weight loss over 3 months, fasting glucose 240",
symptom_embedding: [0.61, 0.18, -0.04, 0.42, 0.21]})
MERGE (p2:Patient {mrn: "MRN-202", age: 58, sex: "M",
presentation: "polyuria, polydipsia, weight loss, A1c 11.4, BMI 32",
symptom_embedding: [0.62, 0.17, -0.03, 0.43, 0.22]})
MERGE (p3:Patient {mrn: "MRN-203", age: 71, sex: "F",
presentation: "intermittent claudication, pulses absent in left foot, smoker",
symptom_embedding: [-0.31, 0.55, 0.10, -0.04, 0.16]})
MERGE (p4:Patient {mrn: "MRN-204", age: 49, sex: "M",
presentation: "fatigue, polyuria, fasting glucose 220, family history of diabetes",
symptom_embedding: [0.60, 0.19, -0.05, 0.41, 0.20]})
MERGE (p5:Patient {mrn: "MRN-205", age: 62, sex: "F",
presentation: "headache, blurred vision, BP 198/110, no neuro deficit",
symptom_embedding: [0.04, -0.41, 0.62, 0.10, 0.18]})
MERGE (p6:Patient {mrn: "MRN-206", age: 68, sex: "M",
presentation: "fatigue, weight loss, polyuria, A1c 12.1, microalbuminuria",
symptom_embedding: [0.59, 0.20, -0.06, 0.42, 0.19]})
MERGE (rx1:Treatment {name: "Metformin 1000mg BID + lifestyle"})
MERGE (rx2:Treatment {name: "Insulin glargine + metformin"})
MERGE (rx3:Treatment {name: "Cilostazol + smoking cessation referral"})
MERGE (rx4:Treatment {name: "Empagliflozin + ACE inhibitor"})
MERGE (rx5:Treatment {name: "Lisinopril + amlodipine"})
MERGE (out1:Outcome {label: "A1c < 7 at 6 months", positive: true})
MERGE (out2:Outcome {label: "Glycemic control inadequate", positive: false})
MERGE (out3:Outcome {label: "Symptom resolution", positive: true})
MERGE (out4:Outcome {label: "Hospital readmission", positive: false})
MERGE (out5:Outcome {label: "BP < 140/90 at 3 months", positive: true})
MERGE (p1)-[:RECEIVED]->(rx1)-[:RESULTED_IN]->(out1)
MERGE (p2)-[:RECEIVED]->(rx2)-[:RESULTED_IN]->(out1)
MERGE (p4)-[:RECEIVED]->(rx1)-[:RESULTED_IN]->(out2)
MERGE (p6)-[:RECEIVED]->(rx4)-[:RESULTED_IN]->(out1)
MERGE (p3)-[:RECEIVED]->(rx3)-[:RESULTED_IN]->(out3)
MERGE (p5)-[:RECEIVED]->(rx5)-[:RESULTED_IN]->(out5);
Step 2: Find similar patients to a new admission
// New admission p1 — surface clinically similar past patients and what worked.
MATCH (target:Patient {mrn: "MRN-201"})-[:SIMILAR_TO > 0.85]->(similar:Patient)
MATCH (similar)-[:RECEIVED]->(rx:Treatment)-[:RESULTED_IN]->(o:Outcome)
RETURN similar.mrn AS mrn,
similar.presentation AS presentation,
rx.name AS treatment,
o.label AS outcome,
o.positive AS was_positive
ORDER BY was_positive DESC;
What's happening
- The
symptom_embeddingis built from presentation text — embedded once on admission and stored on the node. Re-embedding when notes change is one update. SIMILAR_TO > 0.85returns cohort members with semantically similar presentations across age and sex differences — the embedding generalises better than ICD-10 buckets.- Walking to
:RECEIVEDand:RESULTED_INchains the cohort to outcome data without an extra query. The clinician sees "patients like this one received metformin and reached A1c < 7" or "patients like this one were re-hospitalized" all in one row. - Outcome polarity (
positive: true/false) lets the UI flag treatments to favor or avoid. - Same primitive supports clinical-trial cohort discovery, post-market surveillance, and case conferences — every "patients like this" workflow is one Cypher query away.
Try this next
// Treatments that consistently produce positive outcomes for a cohort.
MATCH (target:Patient {mrn: "MRN-201"})-[:SIMILAR_TO > 0.85]->(s:Patient)
MATCH (s)-[:RECEIVED]->(rx:Treatment)-[:RESULTED_IN]->(o:Outcome {positive: true})
WITH rx, count(s) AS positive_uses
RETURN rx.name AS treatment, positive_uses
ORDER BY positive_uses DESC;
// Cohort discovery for a registry: every patient near a seed presentation.
MATCH (seed:Patient {mrn: "MRN-202"})-[:SIMILAR_TO > 0.8]->(member:Patient)
RETURN member.mrn, member.age, member.presentation;
// Outliers: patients with no clinically similar peers in the cohort.
MATCH (p:Patient)
OPTIONAL MATCH (p)-[:SIMILAR_TO > 0.8]->(near:Patient)
WITH p, count(near) AS peers
WHERE peers = 0
RETURN p.mrn, p.presentation AS unique_presentation;