Drug repurposing via mechanism similarity
Objective
Repurposing an approved drug to a new indication takes 3-5 years vs 12 for a novel molecule. The
search problem is "find drugs whose mechanism resembles a known winner's, that act on the same
target family." Modeling drugs, targets, and diseases as a graph, with mechanism_embedding
on each drug node, lets SIMILAR_TO find mechanism-similar candidates in one hop. The wow
moment: a single Cypher query returns "drugs not currently indicated for X but whose mechanism
resembles a drug that works for X, ordered by similarity."
Step 1: Build the small drug-target-disease graph
MERGE (d1:Drug {name: "Metformin", indication: "Type 2 Diabetes",
mechanism: "Activates AMPK, reduces hepatic glucose output",
embedding: [0.61, 0.18, -0.04, 0.42, 0.21]})
MERGE (d2:Drug {name: "Phenformin", indication: "Withdrawn",
mechanism: "Activates AMPK pathway, increases insulin sensitivity",
embedding: [0.62, 0.17, -0.03, 0.43, 0.22]})
MERGE (d3:Drug {name: "Berberine", indication: "Diarrhea",
mechanism: "Plant alkaloid, activates AMPK, lowers blood glucose",
embedding: [0.60, 0.19, -0.05, 0.41, 0.20]})
MERGE (d4:Drug {name: "Sitagliptin", indication: "Type 2 Diabetes",
mechanism: "DPP-4 inhibitor, prolongs incretin action",
embedding: [-0.21, 0.55, 0.10, 0.05, -0.30]})
MERGE (d5:Drug {name: "Linagliptin", indication: "Type 2 Diabetes",
mechanism: "DPP-4 inhibitor, similar to sitagliptin",
embedding: [-0.20, 0.56, 0.11, 0.04, -0.31]})
MERGE (d6:Drug {name: "Atorvastatin", indication: "Hyperlipidemia",
mechanism: "HMG-CoA reductase inhibitor, lowers LDL",
embedding: [-0.55, -0.08, 0.41, -0.12, 0.07]})
MERGE (d7:Drug {name: "Empagliflozin", indication: "Type 2 Diabetes",
mechanism: "SGLT2 inhibitor, increases urinary glucose excretion",
embedding: [0.04, -0.41, 0.62, 0.10, 0.18]})
// Targets
MERGE (t1:Target {name: "AMPK"})
MERGE (t2:Target {name: "DPP-4"})
MERGE (t3:Target {name: "HMG-CoA reductase"})
MERGE (t4:Target {name: "SGLT2"})
// Diseases
MERGE (dz1:Disease {name: "Type 2 Diabetes"})
MERGE (dz2:Disease {name: "Polycystic Ovary Syndrome"})
MERGE (dz3:Disease {name: "Cancer (multiple solid tumors)"})
MERGE (d1)-[:ACTS_ON]->(t1)
MERGE (d2)-[:ACTS_ON]->(t1)
MERGE (d3)-[:ACTS_ON]->(t1)
MERGE (d4)-[:ACTS_ON]->(t2)
MERGE (d5)-[:ACTS_ON]->(t2)
MERGE (d6)-[:ACTS_ON]->(t3)
MERGE (d7)-[:ACTS_ON]->(t4)
MERGE (t1)-[:IMPLICATED_IN]->(dz1)
MERGE (t1)-[:IMPLICATED_IN]->(dz2)
MERGE (t1)-[:IMPLICATED_IN]->(dz3)
MERGE (t2)-[:IMPLICATED_IN]->(dz1)
MERGE (t4)-[:IMPLICATED_IN]->(dz1)
MERGE (d1)-[:TREATS]->(dz1)
MERGE (d4)-[:TREATS]->(dz1)
MERGE (d5)-[:TREATS]->(dz1)
MERGE (d7)-[:TREATS]->(dz1);
Step 2: Find candidates for repurposing
// Strategy: take a drug that already TREATS a disease via a target.
// Find OTHER drugs whose mechanism is similar AND that act on the same target,
// but are NOT yet indicated for that disease — repurposing candidates.
MATCH (winner:Drug {name: "Metformin"})-[:ACTS_ON]->(t:Target)
MATCH (winner)-[:SIMILAR_TO > 0.9]->(candidate:Drug)
MATCH (candidate)-[:ACTS_ON]->(t)
WHERE NOT (candidate)-[:TREATS]->(:Disease {name: "Type 2 Diabetes"})
RETURN candidate.name AS repurpose_candidate,
candidate.indication AS current_indication,
t.name AS shared_target,
candidate.mechanism AS mechanism;
Step 3: Reach all diseases the shared target is implicated in
// Broaden: which diseases could this candidate plausibly help with, given its target?
MATCH (winner:Drug {name: "Metformin"})-[:SIMILAR_TO > 0.9]->(candidate:Drug)
MATCH (candidate)-[:ACTS_ON]->(t:Target)-[:IMPLICATED_IN]->(disease:Disease)
WHERE NOT (candidate)-[:TREATS]->(disease)
RETURN candidate.name AS candidate,
t.name AS target,
disease.name AS plausible_indication;
What's happening
SIMILAR_TO > 0.9overmechanismembeddings clusters Metformin, Phenformin, and Berberine — three structurally-different molecules that all activate AMPK. Vector similarity sees the semantic equivalence prose-based search misses ("plant alkaloid" vs "biguanide").- The structural pattern
(:Drug)-[:ACTS_ON]->(:Target)-[:IMPLICATED_IN]->(:Disease)ensures candidates have a plausible biological pathway, not just a similar description. NOT (candidate)-[:TREATS]->(disease)is the key filter: only return drugs that aren't already indicated. This is the actual repurposing question.- Real pipelines use the same primitives: target overlap (graph) + chemoinformatic similarity
(vector) + literature evidence (extracted with
/v2/graph/extract). All three live in one engine here. - This is the kind of query Eli Lilly and Roche pay seven figures for from Hetionet/PrimeKG-style knowledge graphs — at this scale you can prototype it on a laptop.
Try this next
// Drugs targeting the same disease but via different targets (combination-therapy candidates).
MATCH (d1:Drug)-[:TREATS]->(dz:Disease {name: "Type 2 Diabetes"})<-[:TREATS]-(d2:Drug)
MATCH (d1)-[:ACTS_ON]->(t1:Target),
(d2)-[:ACTS_ON]->(t2:Target)
WHERE t1 <> t2 AND id(d1) < id(d2)
RETURN d1.name AS drug_a, t1.name AS target_a,
d2.name AS drug_b, t2.name AS target_b;
// Reverse search: given a target, what mechanism-similar drugs exist beyond the obvious?
MATCH (t:Target {name: "AMPK"})<-[:ACTS_ON]-(d:Drug)
MATCH (d)-[:SIMILAR_TO > 0.85]->(near:Drug)
WHERE NOT (near)-[:ACTS_ON]->(t)
RETURN d.name AS known, near.name AS off_target_lookalike, near.indication;
MATCH (d:Drug)-[:ACTS_ON]->(t:Target)-[:IMPLICATED_IN]->(dz:Disease)
RETURN dz.name AS disease, count(DISTINCT d) AS candidate_drugs
ORDER BY candidate_drugs DESC;