Objective
On a voice call, every second of latency is dead air — and most callers ask the same handful of questions in slightly different words. An exact-match cache misses all the paraphrases. A semantic cache hits when the meaning matches: if you've already answered "what time do you close?", you instantly reuse it for "how late are you open?" — no LLM round trip, no dead air, no token cost. Here you'll build a semantic answer cache with a vector lookup and a hit threshold. Your STT/TTS and telephony live outside; the cache is the database. See Use it from your agent for wiring it into any voice stack.
Step 1: Create the semantic answer cache
Each row is a previously answered question, its answer, an embedding, and a hit counter.
CREATE TABLE IF NOT EXISTS recipe_voice_cache (
cache_id INTEGER PRIMARY KEY,
question TEXT,
answer TEXT,
hits INTEGER DEFAULT 0,
embedding VECTOR(384)
);
Step 2: Warm the cache with common Q&A
The questions a phone line answers all day — store the canonical answer once.
INSERT INTO recipe_voice_cache (cache_id, question, answer) VALUES
(1,'What are your hours?','We''re open 9am to 8pm Monday through Saturday, closed Sundays.'),
(2,'Where can I park?','There''s free customer parking in the lot behind the building.'),
(3,'What is your return policy?','You can return items within 14 days with a receipt for a full refund.'),
(4,'Do you offer delivery?','Yes, same-day delivery before 2pm within 10 miles.'),
(5,'What payment methods do you take?','We accept all major cards, mobile wallets, and cash.');
Step 3: Embed the cached questions
The embedding model runs in-database; this is the index the cache is looked up against.
UPDATE recipe_voice_cache SET embedding = EMBED(question);
Step 4: Look up a paraphrased question in the cache
The caller asks "how late are you open tonight?"; find the closest cached question by meaning.
SELECT cache_id, question, answer,
COSINE_SIMILARITY(embedding, EMBED('how late are you open tonight?')) AS similarity
FROM recipe_voice_cache
ORDER BY similarity DESC
LIMIT 1;
Step 5: Decide hit vs. miss with a threshold
Serve the cached answer only on a strong semantic match; otherwise fall through to the live LLM path.
SELECT answer, similarity,
CASE WHEN similarity >= 0.55 THEN 'CACHE_HIT' ELSE 'MISS_CALL_LLM' END AS decision
FROM (
SELECT answer,
COSINE_SIMILARITY(embedding, EMBED('what time do you shut tonight?')) AS similarity
FROM recipe_voice_cache
ORDER BY similarity DESC
LIMIT 1
);
Step 6: Count the hit (so you can measure cache value)
On a hit, increment the counter for the matched entry to track how much latency/cost the cache saves. We first pin the matched entry into a one-row table (projecting the similarity we order by), then increment by that id.
CREATE TABLE IF NOT EXISTS recipe_voice_cache_hit (id INTEGER PRIMARY KEY, cache_id INTEGER, similarity DOUBLE);
INSERT INTO recipe_voice_cache_hit (id, cache_id, similarity)
SELECT 1, cache_id,
COSINE_SIMILARITY(embedding, EMBED('what time do you shut tonight?')) AS similarity
FROM recipe_voice_cache
ORDER BY similarity DESC
LIMIT 1;
UPDATE recipe_voice_cache
SET hits = hits + 1
WHERE cache_id = (SELECT cache_id FROM recipe_voice_cache_hit LIMIT 1);
Step 7: Fall through and populate the cache on a miss
A genuinely new question misses the threshold; generate the answer once, then store it so the next paraphrase hits.
INSERT INTO recipe_voice_cache (cache_id, question, answer, embedding)
SELECT 6, 'Do you price match?',
GENERATE('Answer in one short spoken sentence: Do you price match competitors? Assume yes, within 30 days with proof.'),
EMBED('Do you price match?');
Cleanup (Optional)
DROP TABLE IF EXISTS recipe_voice_cache;
DROP TABLE IF EXISTS recipe_voice_cache_hit;
Expected Outcomes
- Step 4 matches "how late are you open tonight?" to the cached hours question — a paraphrase an exact cache would miss.
- Step 5 returns CACHE_HIT with the stored answer for "what time do you shut tonight?" — instant, no LLM call.
- Step 6 increments the hit counter so you can quantify the saved latency and cost.
- Step 7 answers a brand-new question once, then caches it so the next phrasing of it hits.
You now have a semantic answer cache that turns repeated, reworded questions into instant replies — less dead air on calls, fewer LLM tokens.
Use it from your agent (framework-agnostic — the DB is the brain, the voice stack is swappable)
A semantic cache is just a Q&A table + a lookup-before-LLM check, called from any voice runtime:
- REST / SDK —
POST /v1/query/execute(any language), or@synapcores/sdkclient.executeQuery(...). On each question, run the Step-5 lookup first; on CACHE_HIT send the stored answer straight to TTS, on a miss call your LLM and write the result back (Step 7). Works with Vapi, LiveKit Agents, Pipecat, Twilio, or Retell. - MCP (native, on by default) — point your voice runtime's MCP client at
ws://<your-instance>/mcp?token=<jwt>(JWT from onePOST /v1/auth/login→access_token). Thequerytool checks the cache; theexecutetool records hits and inserts new answers — the cache as tool calls in the turn loop. - Any framework — the same cache fronts a phone agent, a voice widget, or a chatbot; the latency win is biggest on voice but the pattern is universal. The database is the brain; the framework (and the voice stack) is swappable.
Key Concepts Learned
- A semantic cache hits on meaning, so paraphrases reuse a stored answer — unlike an exact-match cache.
- A similarity threshold separates safe cache hits from genuine misses that need the live LLM.
- Populating the cache on every miss makes it warm up automatically and pay off over a call's lifetime.
- Because it's plain data ops (SQL / REST / MCP), the semantic cache works with any STT/TTS stack — the database-as-the-brain pattern the voice cluster builds on.