Objective
A voice agent that improvises answers about your hours, your policies, or your products will confidently say the wrong thing — out loud, to a customer. RAG fixes that for voice the same way it does for chat: retrieve the relevant passage from your knowledge base, then generate a short, speakable answer grounded in it, and decline when the answer isn't there. The twist for voice is brevity — spoken replies must be one or two sentences. Your STT/TTS and telephony live outside; the knowledge base is the database. See Use it from your agent for wiring it into any voice stack.
Step 1: Create the voice knowledge base
Knowledge-base chunks the agent answers from, embedded for retrieval.
CREATE TABLE IF NOT EXISTS recipe_voice_kb (
chunk_id INTEGER PRIMARY KEY,
source TEXT,
content TEXT,
embedding VECTOR(384)
);
Step 2: Load the knowledge base
Facts a phone agent for a store/service must get exactly right.
INSERT INTO recipe_voice_kb (chunk_id, source, content) VALUES
(1,'hours','The store is open 9am to 8pm Monday through Saturday, and closed on Sundays.'),
(2,'returns','Items can be returned within 14 days with a receipt for a full refund.'),
(3,'parking','Free customer parking is available in the lot behind the building.'),
(4,'delivery','Same-day delivery is available for orders placed before 2pm within 10 miles.'),
(5,'payment','We accept all major cards, mobile wallets, and cash; no checks.'),
(6,'warranty','Electronics carry a 1-year manufacturer warranty handled in-store.');
Step 3: Embed the knowledge base
The embedding model runs in-database — your spoken-answer retrieval index in one line.
UPDATE recipe_voice_kb SET embedding = EMBED(content);
Step 4: Retrieve the passage for a spoken question
The caller asks about Sunday hours; pull the most relevant fact by meaning.
SELECT source, content,
COSINE_SIMILARITY(embedding, EMBED('are you open on the weekend, like Sunday afternoon?')) AS relevance
FROM recipe_voice_kb
ORDER BY relevance DESC
LIMIT 2;
Step 5: Generate a short, speakable, grounded answer
Constrain GENERATE() to one spoken sentence from the retrieved context — accurate and TTS-friendly. First materialize the top passages into a context row, then generate from it.
CREATE TABLE IF NOT EXISTS recipe_voice_kb_top (
chunk_id INTEGER PRIMARY KEY,
content TEXT,
relevance DOUBLE
);
INSERT INTO recipe_voice_kb_top (chunk_id, content, relevance)
SELECT chunk_id, content,
COSINE_SIMILARITY(embedding, EMBED('are you open on Sunday?')) AS relevance
FROM recipe_voice_kb
ORDER BY relevance DESC
LIMIT 2;
CREATE TABLE IF NOT EXISTS recipe_voice_kb_ctx (id INTEGER PRIMARY KEY, context TEXT);
INSERT INTO recipe_voice_kb_ctx (id, context)
SELECT 1, GROUP_CONCAT(content, ' ') FROM recipe_voice_kb_top;
SELECT GENERATE(
'Answer in ONE short spoken sentence using ONLY the context; if it is not in the context, say you will check. Context: ' ||
context ||
' Caller asked: Are you open on Sunday? Spoken answer:') AS spoken_answer
FROM recipe_voice_kb_ctx;
Step 6: Refuse to guess on a call
Ask something the knowledge base doesn't cover; the grounding instruction makes the agent defer instead of inventing. Reuse the same retrieved context (it has no gift-wrapping fact), so the agent must defer.
SELECT GENERATE(
'Answer in ONE short spoken sentence using ONLY the context; if it is not in the context, say you will check and follow up. Context: ' ||
context ||
' Caller asked: Do you offer gift wrapping? Spoken answer:') AS spoken_answer
FROM recipe_voice_kb_ctx;
Cleanup (Optional)
DROP TABLE IF EXISTS recipe_voice_kb;
DROP TABLE IF EXISTS recipe_voice_kb_top;
DROP TABLE IF EXISTS recipe_voice_kb_ctx;
Expected Outcomes
- Step 4 retrieves the store-hours fact for a weekend question — by meaning, not keywords.
- Step 5 returns a single spoken sentence ("We're open 9 to 8 Monday through Saturday and closed Sundays") — accurate and TTS-ready.
- Step 6 defers on gift wrapping ("Let me check on that and follow up") because it isn't in the knowledge base — no on-call hallucination.
You now have a voice agent that answers from your knowledge base in short, speakable sentences and gracefully defers when it doesn't know.
Use it from your agent (framework-agnostic — the DB is the brain, the voice stack is swappable)
Voice RAG is just a knowledge table + retrieve + a brevity-constrained generate, called from any voice runtime:
- REST / SDK —
POST /v1/query/execute(any language), or@synapcores/sdkclient.executeQuery(...). On each caller question, run the Step-5 retrieve+generate query and send the one-sentence result straight to TTS. Drop it into Vapi, LiveKit Agents, Pipecat, Twilio, or Retell. - MCP (native, on by default) — point your voice runtime's MCP client at
ws://<your-instance>/mcp?token=<jwt>(JWT from onePOST /v1/auth/login→access_token). Thequerytool retrieves and generates the spoken answer; theexecutetool ingests new knowledge — voice RAG as a tool call inside the turn loop. - Any framework — the same knowledge base grounds a phone agent, a voice widget, or a text bot; the brevity instruction is the only voice-specific tweak. The database is the brain; the framework (and the voice stack) is swappable.
Key Concepts Learned
- Voice RAG is chat RAG with a brevity constraint — retrieve the passage, generate one spoken sentence.
- The "use only the context, else defer" instruction prevents on-call hallucination, the worst voice failure.
- Keeping answers to one sentence makes them natural through TTS and keeps latency low.
- Because it's plain data ops (SQL / REST / MCP), voice RAG works with any STT/TTS stack — the database-as-the-brain pattern the voice cluster builds on.