Objective
A chatbot that re-sends the whole transcript every turn eventually overflows the context
window and pays for it on every call. The fix is two-layered memory: keep the full
history for semantic recall, but feed the model a compact rolling summary of older
turns plus the last few messages verbatim. Here you'll build both in one table — store every
turn, recall old turns by meaning, and maintain a running summary with GENERATE(). The same
store works from any framework or a voice agent — see Use it from your agent at the end.
Step 1: Create the conversation turn store
One row per message, embedded for recall, tagged by session and role.
CREATE TABLE IF NOT EXISTS recipe_chat_turns (
turn_id INTEGER PRIMARY KEY,
session_id TEXT,
role TEXT, -- 'user' or 'assistant'
content TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
embedding VECTOR(384)
);
Step 2: Create the rolling summary store
One summary row per session — the compressed memory of everything before the recent window.
CREATE TABLE IF NOT EXISTS recipe_chat_summary (
session_id TEXT PRIMARY KEY,
summary TEXT,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Step 3: Record a conversation
A realistic multi-turn chat that's already too long to resend in full on every call.
INSERT INTO recipe_chat_turns (turn_id, session_id, role, content) VALUES
(1,'s1','user','Hi, I need help planning a 10-day trip to Japan in April.'),
(2,'s1','assistant','Great! April is cherry-blossom season. Which cities interest you?'),
(3,'s1','user','Tokyo and Kyoto, and I want to keep it under $3000.'),
(4,'s1','assistant','Got it — budget $3000, Tokyo and Kyoto. Any food preferences?'),
(5,'s1','user','I''m vegetarian, and I love trains over flights.'),
(6,'s1','assistant','Noted: vegetarian, prefers rail. I''ll plan a JR Pass route.'),
(7,'s1','user','Also I get motion sick, so nothing too winding.'),
(8,'s1','assistant','Understood — I''ll avoid mountain bus routes.');
Step 4: Embed the turns
Embed every message so older turns are recallable by meaning, not just by position.
UPDATE recipe_chat_turns SET embedding = EMBED(content);
Step 5: Recall the most relevant past turns for the next message
When the user asks something new, pull the few past turns that matter — not the whole transcript.
SELECT turn_id, role, content,
COSINE_SIMILARITY(embedding, EMBED('what are the user''s dietary and travel constraints?')) AS relevance
FROM recipe_chat_turns
WHERE session_id = 's1'
ORDER BY relevance DESC
LIMIT 3;
Step 6: Build a rolling summary of the conversation
Compress the whole session into a few sentences the model can carry cheaply on every turn. First flatten the transcript into one context row, then generate the summary from that row and store it.
CREATE TABLE IF NOT EXISTS recipe_chat_transcript (session_id TEXT PRIMARY KEY, transcript TEXT);
INSERT INTO recipe_chat_transcript (session_id, transcript)
SELECT 's1', GROUP_CONCAT(role || ': ' || content, ' | ')
FROM recipe_chat_turns WHERE session_id = 's1';
INSERT INTO recipe_chat_summary (session_id, summary)
SELECT session_id,
GENERATE('Summarize this chat as durable user context in 2-3 sentences (preferences, constraints, goal): ' || transcript)
FROM recipe_chat_transcript;
Step 7: Assemble the cheap context for the next reply
Combine the rolling summary with only the last 2 verbatim turns — the model gets full context at a fraction of the tokens. Materialize the recent window into its own table (projecting turn_id so we can order by it), then read both back.
CREATE TABLE IF NOT EXISTS recipe_chat_recent (turn_id INTEGER PRIMARY KEY, role TEXT, content TEXT);
INSERT INTO recipe_chat_recent (turn_id, role, content)
SELECT turn_id, role, content FROM recipe_chat_turns
WHERE session_id = 's1' ORDER BY turn_id DESC LIMIT 2;
SELECT
(SELECT summary FROM recipe_chat_summary WHERE session_id='s1') AS rolling_summary,
(SELECT GROUP_CONCAT(role || ': ' || content, ' | ') FROM recipe_chat_recent) AS recent_turns;
Step 8: Generate the next reply from summary + recent turns
Ground the response in compact memory instead of the full transcript.
SELECT GENERATE(
'You are a travel assistant. Context: ' || summary ||
' The user now asks: Suggest a vegetarian dinner spot in Kyoto. Answer in one sentence respecting their constraints.') AS reply
FROM recipe_chat_summary WHERE session_id = 's1';
Cleanup (Optional)
DROP TABLE IF EXISTS recipe_chat_turns;
DROP TABLE IF EXISTS recipe_chat_summary;
DROP TABLE IF EXISTS recipe_chat_transcript;
DROP TABLE IF EXISTS recipe_chat_recent;
Expected Outcomes
- Step 5 surfaces the vegetarian / rail / motion-sickness turns for a constraints query — relevant history without the whole log.
- Step 6 writes a 2–3 sentence rolling summary that captures the budget, cities, and preferences.
- Step 7 shows the cheap context payload: summary + last 2 turns, far fewer tokens than the full transcript.
- Step 8 answers with a vegetarian Kyoto suggestion grounded in the summary — full memory, tiny context.
You now have a chatbot that remembers everything, recalls by meaning, and never blows the context window.
Use it from your agent (framework-agnostic — this is the whole point)
Conversation memory is just two tables + the same data ops, so any chatbot uses it with no framework lock-in:
- REST / SDK —
POST /v1/query/execute(any language), or@synapcores/sdkclient.executeQuery(...). Your bot appends each turn (Step 3), refreshes the rolling summary every N turns (Step 6), and assembles summary + recent turns (Step 7) as the model's context. - MCP (native, on by default) — point any MCP client (Claude Code, Cursor, a custom loop, a voice runtime) at
ws://<your-instance>/mcp?token=<jwt>(JWT from onePOST /v1/auth/login→access_token). Theexecutetool appends turns and updates the summary; thequerytool recalls relevant history — conversation memory becomes tool calls, no SDK required. - Any framework — OpenClaw, LangChain's memory classes, LlamaIndex, a custom loop, or a voice agent all read and write the same turn + summary tables. The database is the brain; the framework is swappable.
Key Concepts Learned
- Store the full transcript for recall, but feed the model a rolling summary + the last few turns — not the whole log.
COSINE_SIMILARITYrecalls old turns by meaning regardless of how far back they are.GENERATE()over the transcript maintains a compact summary that controls context-window cost.- Because it's plain data ops (SQL / REST / MCP), conversation memory works for any chatbot — the agent-agnostic backend pattern this cluster builds on.