Objective

On a live call there's no time for a clarifying menu — the caller says "I think my card got declined" and the voice agent must instantly route to the billing skill, not read out a phone tree. Hard-coded grammars break on natural speech. Here you'll route by meaning: embed each skill's description, match the transcribed utterance to the nearest skill in one query, and gate low-confidence matches to a quick "could you say that another way?" instead of a wrong action. Your STT/TTS and telephony live outside the database; the routing brain is the database. See Use it from your agent for wiring it into any voice stack.

Step 1: Create the skill registry

Each row is a skill the voice agent can hand off to, described in natural language and embedded.

CREATE TABLE IF NOT EXISTS recipe_voice_skills (
  skill_id    INTEGER PRIMARY KEY,
  skill_name  TEXT,                                   -- the skill/flow the agent triggers
  description TEXT,                                   -- what the caller wants, in plain language
  embedding   VECTOR(384)
);

Step 2: Register the voice agent's skills

A realistic skill set for a phone support line.

INSERT INTO recipe_voice_skills (skill_id, skill_name, description) VALUES
 (1,'billing_help','Questions about charges, declined cards, invoices, or payments.'),
 (2,'tech_support','The product or service is broken, erroring, slow, or not connecting.'),
 (3,'account_access','The caller cannot log in, is locked out, or needs a password reset.'),
 (4,'new_order','The caller wants to place a new order or sign up for a plan.'),
 (5,'speak_to_human','The caller is frustrated or explicitly asks for a live person.'),
 (6,'order_status','The caller wants to know where their order or delivery is.');

Step 3: Embed the skill descriptions

The embedding model runs in-database; this is the index the router searches each turn.

UPDATE recipe_voice_skills SET embedding = EMBED(description);

Step 4: Route a transcribed utterance to a skill

Match what the caller said to the nearest skill — robust to natural, messy speech.

SELECT skill_name, description,
       COSINE_SIMILARITY(embedding, EMBED('uh yeah my payment didn''t go through and I got charged anyway')) AS match_score
FROM recipe_voice_skills
ORDER BY match_score DESC
LIMIT 3;

Step 5: Pick one skill with a confidence gate

Trigger the top skill only when it clears a threshold; otherwise reprompt rather than misroute mid-call.

SELECT skill_name, match_score,
       CASE WHEN match_score >= 0.32 THEN 'TRIGGER' ELSE 'REPROMPT' END AS decision
FROM (
  SELECT skill_name,
         COSINE_SIMILARITY(embedding, EMBED('I just want to talk to a real person, this is ridiculous')) AS match_score
  FROM recipe_voice_skills
  ORDER BY match_score DESC
  LIMIT 1
);

Step 6: Detect an explicit escalation

A frustrated caller asking for a human should route to the live-person skill every time. We project the similarity as match_score so we can order by it.

SELECT skill_name,
       COSINE_SIMILARITY(embedding, EMBED('get me a manager now')) AS match_score
FROM recipe_voice_skills
WHERE skill_name = 'speak_to_human'
   OR COSINE_SIMILARITY(embedding, EMBED('get me a manager now')) >= 0.30
ORDER BY match_score DESC
LIMIT 1;

Cleanup (Optional)

DROP TABLE IF EXISTS recipe_voice_skills;

Expected Outcomes

Step 4 routes "my payment didn't go through" to billing_help — by meaning, despite filler words and no exact keywords.
Step 5 triggers speak_to_human for the frustrated caller, with a TRIGGER decision above the gate.
Step 6 reliably surfaces speak_to_human for "get me a manager now" — explicit escalations never fall through.

You now have real-time, meaning-based skill routing for a voice agent — one fast vector lookup per turn, with a built-in reprompt guardrail.

Use it from your agent (framework-agnostic — the DB is the brain, the voice stack is swappable)

Routing is just a skill table + one match query, fast enough to run inside the turn loop of any voice runtime:

REST / SDK — POST /v1/query/execute (any language), or @synapcores/sdk client.executeQuery(...). In your STT callback, run the Step-5 query on the transcript and either trigger that skill's flow or reprompt. Drop it into Vapi, LiveKit Agents, Pipecat, Twilio, or Retell — they capture audio, the database decides the route.
MCP (native, on by default) — point your voice runtime's MCP client at ws://<your-instance>/mcp?token=<jwt> (JWT from one POST /v1/auth/login → access_token). The query tool returns the routed skill; your runtime maps it to the dialog flow — real-time routing as a tool call.
Any framework — the same registry routes a phone line, a smart-speaker skill, or a text chatbot; only the transport differs. The database is the brain; the framework (and the voice stack) is swappable.

Key Concepts Learned

Embedding skill descriptions turns "which skill?" into a nearest-neighbor lookup — robust to spontaneous speech.
A confidence gate gives a free "say that another way" reprompt instead of a wrong mid-call action.
One vector query per turn is fast enough for live audio — no grammar files, no phone tree.
Because it's plain data ops (SQL / REST / MCP), voice routing works with any STT/TTS stack — the database-as-the-brain pattern the voice cluster builds on.

Real-Time Intent → Skill Routing for a Voice Agent