Talent matching, resume to job by skill graph + embedding
Objective
ATS keyword matching gives recruiters too many false positives ("React" hits anyone who put it in a buzzword soup) and too many misses ("LLMOps" rejects the candidate who wrote "production GenAI infra"). A graph of jobs → required skills, candidates → demonstrated skills, plus embedding-based similarity on candidate summaries fixes both. The wow moment: one query returns candidates with the structural skill hits AND a semantic similarity to the job AND an LLM-judged "would they thrive in this role" score, ranked together.
Step 1: Set up jobs, skills, and candidates
MERGE (j1:Job {id: "JOB-101", title: "Senior ML Engineer",
summary: "Train and deploy production LLM systems with vector DBs and observability",
embedding: [0.41, 0.13, -0.22, 0.55, 0.07]})
MERGE (j2:Job {id: "JOB-102", title: "Data Platform Engineer",
summary: "Build streaming ETL pipelines on Kafka and Spark, manage Snowflake warehouse",
embedding: [-0.10, 0.62, 0.31, -0.18, 0.20]})
MERGE (s1:Skill {name: "Python"})
MERGE (s2:Skill {name: "PyTorch"})
MERGE (s3:Skill {name: "Vector Databases"})
MERGE (s4:Skill {name: "LLM Inference"})
MERGE (s5:Skill {name: "Kafka"})
MERGE (s6:Skill {name: "Spark"})
MERGE (s7:Skill {name: "Snowflake"})
MERGE (s8:Skill {name: "Observability"})
MERGE (j1)-[:REQUIRES {weight: 1.0}]->(s1)
MERGE (j1)-[:REQUIRES {weight: 1.0}]->(s2)
MERGE (j1)-[:REQUIRES {weight: 1.0}]->(s4)
MERGE (j1)-[:REQUIRES {weight: 0.7}]->(s3)
MERGE (j1)-[:REQUIRES {weight: 0.5}]->(s8)
MERGE (j2)-[:REQUIRES {weight: 1.0}]->(s5)
MERGE (j2)-[:REQUIRES {weight: 1.0}]->(s6)
MERGE (j2)-[:REQUIRES {weight: 1.0}]->(s7)
MERGE (j2)-[:REQUIRES {weight: 0.5}]->(s1)
MERGE (c1:Candidate {id: "CAND-9001", name: "Sarah Chen",
summary: "Built and deployed production GPT-style assistants on Kubernetes, owns vector retrieval stack",
embedding: [0.40, 0.14, -0.21, 0.56, 0.06]})
MERGE (c2:Candidate {id: "CAND-9002", name: "Raj Patel",
summary: "10 years ETL, recently migrated petabyte warehouse from Hadoop to Snowflake",
embedding: [-0.09, 0.63, 0.30, -0.19, 0.21]})
MERGE (c3:Candidate {id: "CAND-9003", name: "Mia Rossi",
summary: "Computer vision research, papers on diffusion models and robotic manipulation",
embedding: [0.18, -0.30, 0.41, 0.07, -0.22]})
MERGE (c4:Candidate {id: "CAND-9004", name: "Leo Park",
summary: "Built the realtime fraud platform at a fintech, Kafka and Spark heavy",
embedding: [-0.08, 0.61, 0.32, -0.17, 0.19]})
MERGE (c1)-[:HAS_SKILL]->(s1)
MERGE (c1)-[:HAS_SKILL]->(s2)
MERGE (c1)-[:HAS_SKILL]->(s3)
MERGE (c1)-[:HAS_SKILL]->(s4)
MERGE (c1)-[:HAS_SKILL]->(s8)
MERGE (c2)-[:HAS_SKILL]->(s1)
MERGE (c2)-[:HAS_SKILL]->(s5)
MERGE (c2)-[:HAS_SKILL]->(s6)
MERGE (c2)-[:HAS_SKILL]->(s7)
MERGE (c3)-[:HAS_SKILL]->(s1)
MERGE (c3)-[:HAS_SKILL]->(s2)
MERGE (c4)-[:HAS_SKILL]->(s1)
MERGE (c4)-[:HAS_SKILL]->(s5)
MERGE (c4)-[:HAS_SKILL]->(s6);
Step 2: Hybrid match — skills covered + semantic + LLM fit
// Find candidates for "Senior ML Engineer" (JOB-101).
MATCH (j:Job {id: "JOB-101"})-[req:REQUIRES]->(s:Skill)<-[:HAS_SKILL]-(c:Candidate)
WITH j, c,
sum(req.weight) AS skill_score,
count(s) AS skill_hits
MATCH (j)-[:SIMILAR_TO > 0.85]->(c2:Candidate)
WHERE c2 = c
WITH j, c, skill_score, skill_hits,
llm_score("Rate 0..1 how well this candidate would thrive in the job. " +
"Reward production LLM/ML systems experience and cross-functional impact.",
c) AS fit_score
RETURN c.name AS candidate,
skill_hits AS skills_covered,
skill_score AS skill_weight,
fit_score,
skill_score + fit_score AS combined_rank
ORDER BY combined_rank DESC;
What's happening
- The first MATCH counts how many of the job's required skills the candidate actually has and
weights each one by
req.weight— structural recall, exactly like an ATS but cleaner. - The
[:SIMILAR_TO > 0.85]hop filters candidates whose summary embeds near the job description. This catches paraphrased experience ("production GPT assistants" ≈ "production LLM systems") that keyword search would miss. LLM_SCOREthen judges fit on free-form criteria the prompt encodes — culture, impact, growth. The model can read the candidate's full summary and produce a calibrated 0..1 score.- Combined ranking blends the three signals. A candidate strong on skills but weak semantically ranks below one strong on both. Recruiters see a small explainable shortlist.
- Without graph + vector + LLM in one engine you'd need: ATS for skills, vector DB for semantic match, LLM in app code for fit. Three systems, three round trips per query.
Try this next
MATCH (j:Job {id: "JOB-101"})-[:REQUIRES]->(s:Skill)
WITH j, collect(s.name) AS required_skills
MATCH (c:Candidate)-[:HAS_SKILL]->(have:Skill)
WHERE have.name IN required_skills
RETURN c.name, count(have) AS skill_hits
ORDER BY skill_hits DESC;
MATCH (j:Job {id: "JOB-102"})-[:SIMILAR_TO > 0.85]->(c:Candidate)
RETURN c.name AS semantic_match, c.summary;
MATCH (j:Job)-[:REQUIRES]->(s:Skill)
WITH j, collect(s.name) AS req
MATCH (c:Candidate)
OPTIONAL MATCH (c)-[:HAS_SKILL]->(s2:Skill) WHERE s2.name IN req
WITH j, c, count(s2) AS hits, size(req) AS needed
WHERE hits = needed
RETURN j.title, c.name AS fully_qualified;