Talent matching, resume to job by skill graph + embedding

Objective

ATS keyword matching gives recruiters too many false positives ("React" hits anyone who put it in a buzzword soup) and too many misses ("LLMOps" rejects the candidate who wrote "production GenAI infra"). A graph of jobs → required skills, candidates → demonstrated skills, plus embedding-based similarity on candidate summaries fixes both. The wow moment: one query returns candidates with the structural skill hits AND a semantic similarity to the job AND an LLM-judged "would they thrive in this role" score, ranked together.

Step 1: Set up jobs, skills, and candidates

MERGE (j1:Job {id: "JOB-101", title: "Senior ML Engineer",
       summary: "Train and deploy production LLM systems with vector DBs and observability",
       embedding: [0.41, 0.13, -0.22, 0.55, 0.07]})
MERGE (j2:Job {id: "JOB-102", title: "Data Platform Engineer",
       summary: "Build streaming ETL pipelines on Kafka and Spark, manage Snowflake warehouse",
       embedding: [-0.10, 0.62, 0.31, -0.18, 0.20]})

MERGE (s1:Skill {name: "Python"})
MERGE (s2:Skill {name: "PyTorch"})
MERGE (s3:Skill {name: "Vector Databases"})
MERGE (s4:Skill {name: "LLM Inference"})
MERGE (s5:Skill {name: "Kafka"})
MERGE (s6:Skill {name: "Spark"})
MERGE (s7:Skill {name: "Snowflake"})
MERGE (s8:Skill {name: "Observability"})

MERGE (j1)-[:REQUIRES {weight: 1.0}]->(s1)
MERGE (j1)-[:REQUIRES {weight: 1.0}]->(s2)
MERGE (j1)-[:REQUIRES {weight: 1.0}]->(s4)
MERGE (j1)-[:REQUIRES {weight: 0.7}]->(s3)
MERGE (j1)-[:REQUIRES {weight: 0.5}]->(s8)
MERGE (j2)-[:REQUIRES {weight: 1.0}]->(s5)
MERGE (j2)-[:REQUIRES {weight: 1.0}]->(s6)
MERGE (j2)-[:REQUIRES {weight: 1.0}]->(s7)
MERGE (j2)-[:REQUIRES {weight: 0.5}]->(s1)

MERGE (c1:Candidate {id: "CAND-9001", name: "Sarah Chen",
       summary: "Built and deployed production GPT-style assistants on Kubernetes, owns vector retrieval stack",
       embedding: [0.40, 0.14, -0.21, 0.56, 0.06]})
MERGE (c2:Candidate {id: "CAND-9002", name: "Raj Patel",
       summary: "10 years ETL, recently migrated petabyte warehouse from Hadoop to Snowflake",
       embedding: [-0.09, 0.63, 0.30, -0.19, 0.21]})
MERGE (c3:Candidate {id: "CAND-9003", name: "Mia Rossi",
       summary: "Computer vision research, papers on diffusion models and robotic manipulation",
       embedding: [0.18, -0.30, 0.41, 0.07, -0.22]})
MERGE (c4:Candidate {id: "CAND-9004", name: "Leo Park",
       summary: "Built the realtime fraud platform at a fintech, Kafka and Spark heavy",
       embedding: [-0.08, 0.61, 0.32, -0.17, 0.19]})

MERGE (c1)-[:HAS_SKILL]->(s1)
MERGE (c1)-[:HAS_SKILL]->(s2)
MERGE (c1)-[:HAS_SKILL]->(s3)
MERGE (c1)-[:HAS_SKILL]->(s4)
MERGE (c1)-[:HAS_SKILL]->(s8)
MERGE (c2)-[:HAS_SKILL]->(s1)
MERGE (c2)-[:HAS_SKILL]->(s5)
MERGE (c2)-[:HAS_SKILL]->(s6)
MERGE (c2)-[:HAS_SKILL]->(s7)
MERGE (c3)-[:HAS_SKILL]->(s1)
MERGE (c3)-[:HAS_SKILL]->(s2)
MERGE (c4)-[:HAS_SKILL]->(s1)
MERGE (c4)-[:HAS_SKILL]->(s5)
MERGE (c4)-[:HAS_SKILL]->(s6);

Step 2: Hybrid match — skills covered + semantic + LLM fit

// Find candidates for "Senior ML Engineer" (JOB-101).
MATCH (j:Job {id: "JOB-101"})-[req:REQUIRES]->(s:Skill)<-[:HAS_SKILL]-(c:Candidate)
WITH j, c,
     sum(req.weight) AS skill_score,
     count(s)        AS skill_hits
MATCH (j)-[:SIMILAR_TO > 0.85]->(c2:Candidate)
WHERE c2 = c
WITH j, c, skill_score, skill_hits,
     llm_score("Rate 0..1 how well this candidate would thrive in the job. " +
               "Reward production LLM/ML systems experience and cross-functional impact.",
               c) AS fit_score
RETURN c.name      AS candidate,
       skill_hits  AS skills_covered,
       skill_score AS skill_weight,
       fit_score,
       skill_score + fit_score AS combined_rank
ORDER BY combined_rank DESC;

What's happening

The first MATCH counts how many of the job's required skills the candidate actually has and weights each one by req.weight — structural recall, exactly like an ATS but cleaner.
The [:SIMILAR_TO > 0.85] hop filters candidates whose summary embeds near the job description. This catches paraphrased experience ("production GPT assistants" ≈ "production LLM systems") that keyword search would miss.
LLM_SCORE then judges fit on free-form criteria the prompt encodes — culture, impact, growth. The model can read the candidate's full summary and produce a calibrated 0..1 score.
Combined ranking blends the three signals. A candidate strong on skills but weak semantically ranks below one strong on both. Recruiters see a small explainable shortlist.
Without graph + vector + LLM in one engine you'd need: ATS for skills, vector DB for semantic match, LLM in app code for fit. Three systems, three round trips per query.

Try this next

MATCH (j:Job {id: "JOB-101"})-[:REQUIRES]->(s:Skill)
WITH j, collect(s.name) AS required_skills
MATCH (c:Candidate)-[:HAS_SKILL]->(have:Skill)
WHERE have.name IN required_skills
RETURN c.name, count(have) AS skill_hits
ORDER BY skill_hits DESC;

MATCH (j:Job {id: "JOB-102"})-[:SIMILAR_TO > 0.85]->(c:Candidate)
RETURN c.name AS semantic_match, c.summary;

MATCH (j:Job)-[:REQUIRES]->(s:Skill)
WITH j, collect(s.name) AS req
MATCH (c:Candidate)
OPTIONAL MATCH (c)-[:HAS_SKILL]->(s2:Skill) WHERE s2.name IN req
WITH j, c, count(s2) AS hits, size(req) AS needed
WHERE hits = needed
RETURN j.title, c.name AS fully_qualified;

Talent matching, resume to job by skill graph + embedding

Talent matching, resume to job by skill graph + embedding

Objective

Step 1: Set up jobs, skills, and candidates

Step 2: Hybrid match — skills covered + semantic + LLM fit

What's happening

Try this next

Run this on your own machine