Objective

A voice agent that improvises answers about your hours, your policies, or your products will confidently say the wrong thing — out loud, to a customer. RAG fixes that for voice the same way it does for chat: retrieve the relevant passage from your knowledge base, then generate a short, speakable answer grounded in it, and decline when the answer isn't there. The twist for voice is brevity — spoken replies must be one or two sentences. Your STT/TTS and telephony live outside; the knowledge base is the database. See Use it from your agent for wiring it into any voice stack.

Step 1: Create the voice knowledge base

Knowledge-base chunks the agent answers from, embedded for retrieval.

CREATE TABLE IF NOT EXISTS recipe_voice_kb (
  chunk_id  INTEGER PRIMARY KEY,
  source    TEXT,
  content   TEXT,
  embedding VECTOR(384)
);

Step 2: Load the knowledge base

Facts a phone agent for a store/service must get exactly right.

INSERT INTO recipe_voice_kb (chunk_id, source, content) VALUES
 (1,'hours','The store is open 9am to 8pm Monday through Saturday, and closed on Sundays.'),
 (2,'returns','Items can be returned within 14 days with a receipt for a full refund.'),
 (3,'parking','Free customer parking is available in the lot behind the building.'),
 (4,'delivery','Same-day delivery is available for orders placed before 2pm within 10 miles.'),
 (5,'payment','We accept all major cards, mobile wallets, and cash; no checks.'),
 (6,'warranty','Electronics carry a 1-year manufacturer warranty handled in-store.');

Step 3: Embed the knowledge base

The embedding model runs in-database — your spoken-answer retrieval index in one line.

UPDATE recipe_voice_kb SET embedding = EMBED(content);

Step 4: Retrieve the passage for a spoken question

The caller asks about Sunday hours; pull the most relevant fact by meaning.

SELECT source, content,
       COSINE_SIMILARITY(embedding, EMBED('are you open on the weekend, like Sunday afternoon?')) AS relevance
FROM recipe_voice_kb
ORDER BY relevance DESC
LIMIT 2;

Step 5: Generate a short, speakable, grounded answer

Constrain GENERATE() to one spoken sentence from the retrieved context — accurate and TTS-friendly. First materialize the top passages into a context row, then generate from it.

CREATE TABLE IF NOT EXISTS recipe_voice_kb_top (
  chunk_id  INTEGER PRIMARY KEY,
  content   TEXT,
  relevance DOUBLE
);
INSERT INTO recipe_voice_kb_top (chunk_id, content, relevance)
SELECT chunk_id, content,
       COSINE_SIMILARITY(embedding, EMBED('are you open on Sunday?')) AS relevance
FROM recipe_voice_kb
ORDER BY relevance DESC
LIMIT 2;

CREATE TABLE IF NOT EXISTS recipe_voice_kb_ctx (id INTEGER PRIMARY KEY, context TEXT);
INSERT INTO recipe_voice_kb_ctx (id, context)
SELECT 1, GROUP_CONCAT(content, ' ') FROM recipe_voice_kb_top;

SELECT GENERATE(
  'Answer in ONE short spoken sentence using ONLY the context; if it is not in the context, say you will check. Context: ' ||
  context ||
  ' Caller asked: Are you open on Sunday? Spoken answer:') AS spoken_answer
FROM recipe_voice_kb_ctx;

Step 6: Refuse to guess on a call

Ask something the knowledge base doesn't cover; the grounding instruction makes the agent defer instead of inventing. Reuse the same retrieved context (it has no gift-wrapping fact), so the agent must defer.

SELECT GENERATE(
  'Answer in ONE short spoken sentence using ONLY the context; if it is not in the context, say you will check and follow up. Context: ' ||
  context ||
  ' Caller asked: Do you offer gift wrapping? Spoken answer:') AS spoken_answer
FROM recipe_voice_kb_ctx;

Cleanup (Optional)

DROP TABLE IF EXISTS recipe_voice_kb;
DROP TABLE IF EXISTS recipe_voice_kb_top;
DROP TABLE IF EXISTS recipe_voice_kb_ctx;

Expected Outcomes

Step 4 retrieves the store-hours fact for a weekend question — by meaning, not keywords.
Step 5 returns a single spoken sentence ("We're open 9 to 8 Monday through Saturday and closed Sundays") — accurate and TTS-ready.
Step 6 defers on gift wrapping ("Let me check on that and follow up") because it isn't in the knowledge base — no on-call hallucination.

You now have a voice agent that answers from your knowledge base in short, speakable sentences and gracefully defers when it doesn't know.

Use it from your agent (framework-agnostic — the DB is the brain, the voice stack is swappable)

Voice RAG is just a knowledge table + retrieve + a brevity-constrained generate, called from any voice runtime:

REST / SDK — POST /v1/query/execute (any language), or @synapcores/sdk client.executeQuery(...). On each caller question, run the Step-5 retrieve+generate query and send the one-sentence result straight to TTS. Drop it into Vapi, LiveKit Agents, Pipecat, Twilio, or Retell.
MCP (native, on by default) — point your voice runtime's MCP client at ws://<your-instance>/mcp?token=<jwt> (JWT from one POST /v1/auth/login → access_token). The query tool retrieves and generates the spoken answer; the execute tool ingests new knowledge — voice RAG as a tool call inside the turn loop.
Any framework — the same knowledge base grounds a phone agent, a voice widget, or a text bot; the brevity instruction is the only voice-specific tweak. The database is the brain; the framework (and the voice stack) is swappable.

Key Concepts Learned

Voice RAG is chat RAG with a brevity constraint — retrieve the passage, generate one spoken sentence.
The "use only the context, else defer" instruction prevents on-call hallucination, the worst voice failure.
Keeping answers to one sentence makes them natural through TTS and keeps latency low.
Because it's plain data ops (SQL / REST / MCP), voice RAG works with any STT/TTS stack — the database-as-the-brain pattern the voice cluster builds on.

Voice-Agent RAG Knowledge Base (ground spoken answers)