Objective
On a live call there's no time for a clarifying menu — the caller says "I think my card got declined" and the voice agent must instantly route to the billing skill, not read out a phone tree. Hard-coded grammars break on natural speech. Here you'll route by meaning: embed each skill's description, match the transcribed utterance to the nearest skill in one query, and gate low-confidence matches to a quick "could you say that another way?" instead of a wrong action. Your STT/TTS and telephony live outside the database; the routing brain is the database. See Use it from your agent for wiring it into any voice stack.
Step 1: Create the skill registry
Each row is a skill the voice agent can hand off to, described in natural language and embedded.
CREATE TABLE IF NOT EXISTS recipe_voice_skills (
skill_id INTEGER PRIMARY KEY,
skill_name TEXT, -- the skill/flow the agent triggers
description TEXT, -- what the caller wants, in plain language
embedding VECTOR(384)
);
Step 2: Register the voice agent's skills
A realistic skill set for a phone support line.
INSERT INTO recipe_voice_skills (skill_id, skill_name, description) VALUES
(1,'billing_help','Questions about charges, declined cards, invoices, or payments.'),
(2,'tech_support','The product or service is broken, erroring, slow, or not connecting.'),
(3,'account_access','The caller cannot log in, is locked out, or needs a password reset.'),
(4,'new_order','The caller wants to place a new order or sign up for a plan.'),
(5,'speak_to_human','The caller is frustrated or explicitly asks for a live person.'),
(6,'order_status','The caller wants to know where their order or delivery is.');
Step 3: Embed the skill descriptions
The embedding model runs in-database; this is the index the router searches each turn.
UPDATE recipe_voice_skills SET embedding = EMBED(description);
Step 4: Route a transcribed utterance to a skill
Match what the caller said to the nearest skill — robust to natural, messy speech.
SELECT skill_name, description,
COSINE_SIMILARITY(embedding, EMBED('uh yeah my payment didn''t go through and I got charged anyway')) AS match_score
FROM recipe_voice_skills
ORDER BY match_score DESC
LIMIT 3;
Step 5: Pick one skill with a confidence gate
Trigger the top skill only when it clears a threshold; otherwise reprompt rather than misroute mid-call.
SELECT skill_name, match_score,
CASE WHEN match_score >= 0.32 THEN 'TRIGGER' ELSE 'REPROMPT' END AS decision
FROM (
SELECT skill_name,
COSINE_SIMILARITY(embedding, EMBED('I just want to talk to a real person, this is ridiculous')) AS match_score
FROM recipe_voice_skills
ORDER BY match_score DESC
LIMIT 1
);
Step 6: Detect an explicit escalation
A frustrated caller asking for a human should route to the live-person skill every time. We project the similarity as match_score so we can order by it.
SELECT skill_name,
COSINE_SIMILARITY(embedding, EMBED('get me a manager now')) AS match_score
FROM recipe_voice_skills
WHERE skill_name = 'speak_to_human'
OR COSINE_SIMILARITY(embedding, EMBED('get me a manager now')) >= 0.30
ORDER BY match_score DESC
LIMIT 1;
Cleanup (Optional)
DROP TABLE IF EXISTS recipe_voice_skills;
Expected Outcomes
- Step 4 routes "my payment didn't go through" to billing_help — by meaning, despite filler words and no exact keywords.
- Step 5 triggers speak_to_human for the frustrated caller, with a TRIGGER decision above the gate.
- Step 6 reliably surfaces speak_to_human for "get me a manager now" — explicit escalations never fall through.
You now have real-time, meaning-based skill routing for a voice agent — one fast vector lookup per turn, with a built-in reprompt guardrail.
Use it from your agent (framework-agnostic — the DB is the brain, the voice stack is swappable)
Routing is just a skill table + one match query, fast enough to run inside the turn loop of any voice runtime:
- REST / SDK —
POST /v1/query/execute(any language), or@synapcores/sdkclient.executeQuery(...). In your STT callback, run the Step-5 query on the transcript and either trigger that skill's flow or reprompt. Drop it into Vapi, LiveKit Agents, Pipecat, Twilio, or Retell — they capture audio, the database decides the route. - MCP (native, on by default) — point your voice runtime's MCP client at
ws://<your-instance>/mcp?token=<jwt>(JWT from onePOST /v1/auth/login→access_token). Thequerytool returns the routed skill; your runtime maps it to the dialog flow — real-time routing as a tool call. - Any framework — the same registry routes a phone line, a smart-speaker skill, or a text chatbot; only the transport differs. The database is the brain; the framework (and the voice stack) is swappable.
Key Concepts Learned
- Embedding skill descriptions turns "which skill?" into a nearest-neighbor lookup — robust to spontaneous speech.
- A confidence gate gives a free "say that another way" reprompt instead of a wrong mid-call action.
- One vector query per turn is fast enough for live audio — no grammar files, no phone tree.
- Because it's plain data ops (SQL / REST / MCP), voice routing works with any STT/TTS stack — the database-as-the-brain pattern the voice cluster builds on.