Objective

A chatbot that re-sends the whole transcript every turn eventually overflows the context window and pays for it on every call. The fix is two-layered memory: keep the full history for semantic recall, but feed the model a compact rolling summary of older turns plus the last few messages verbatim. Here you'll build both in one table — store every turn, recall old turns by meaning, and maintain a running summary with GENERATE(). The same store works from any framework or a voice agent — see Use it from your agent at the end.

Step 1: Create the conversation turn store

One row per message, embedded for recall, tagged by session and role.

CREATE TABLE IF NOT EXISTS recipe_chat_turns (
  turn_id     INTEGER PRIMARY KEY,
  session_id  TEXT,
  role        TEXT,                                  -- 'user' or 'assistant'
  content     TEXT,
  created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  embedding   VECTOR(384)
);

Step 2: Create the rolling summary store

One summary row per session — the compressed memory of everything before the recent window.

CREATE TABLE IF NOT EXISTS recipe_chat_summary (
  session_id  TEXT PRIMARY KEY,
  summary     TEXT,
  updated_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Step 3: Record a conversation

A realistic multi-turn chat that's already too long to resend in full on every call.

INSERT INTO recipe_chat_turns (turn_id, session_id, role, content) VALUES
 (1,'s1','user','Hi, I need help planning a 10-day trip to Japan in April.'),
 (2,'s1','assistant','Great! April is cherry-blossom season. Which cities interest you?'),
 (3,'s1','user','Tokyo and Kyoto, and I want to keep it under $3000.'),
 (4,'s1','assistant','Got it — budget $3000, Tokyo and Kyoto. Any food preferences?'),
 (5,'s1','user','I''m vegetarian, and I love trains over flights.'),
 (6,'s1','assistant','Noted: vegetarian, prefers rail. I''ll plan a JR Pass route.'),
 (7,'s1','user','Also I get motion sick, so nothing too winding.'),
 (8,'s1','assistant','Understood — I''ll avoid mountain bus routes.');

Step 4: Embed the turns

Embed every message so older turns are recallable by meaning, not just by position.

UPDATE recipe_chat_turns SET embedding = EMBED(content);

Step 5: Recall the most relevant past turns for the next message

When the user asks something new, pull the few past turns that matter — not the whole transcript.

SELECT turn_id, role, content,
       COSINE_SIMILARITY(embedding, EMBED('what are the user''s dietary and travel constraints?')) AS relevance
FROM recipe_chat_turns
WHERE session_id = 's1'
ORDER BY relevance DESC
LIMIT 3;

Step 6: Build a rolling summary of the conversation

Compress the whole session into a few sentences the model can carry cheaply on every turn. First flatten the transcript into one context row, then generate the summary from that row and store it.

CREATE TABLE IF NOT EXISTS recipe_chat_transcript (session_id TEXT PRIMARY KEY, transcript TEXT);
INSERT INTO recipe_chat_transcript (session_id, transcript)
SELECT 's1', GROUP_CONCAT(role || ': ' || content, ' | ')
FROM recipe_chat_turns WHERE session_id = 's1';

INSERT INTO recipe_chat_summary (session_id, summary)
SELECT session_id,
       GENERATE('Summarize this chat as durable user context in 2-3 sentences (preferences, constraints, goal): ' || transcript)
FROM recipe_chat_transcript;

Step 7: Assemble the cheap context for the next reply

Combine the rolling summary with only the last 2 verbatim turns — the model gets full context at a fraction of the tokens. Materialize the recent window into its own table (projecting turn_id so we can order by it), then read both back.

CREATE TABLE IF NOT EXISTS recipe_chat_recent (turn_id INTEGER PRIMARY KEY, role TEXT, content TEXT);
INSERT INTO recipe_chat_recent (turn_id, role, content)
SELECT turn_id, role, content FROM recipe_chat_turns
WHERE session_id = 's1' ORDER BY turn_id DESC LIMIT 2;

SELECT
  (SELECT summary FROM recipe_chat_summary WHERE session_id='s1') AS rolling_summary,
  (SELECT GROUP_CONCAT(role || ': ' || content, ' | ') FROM recipe_chat_recent) AS recent_turns;

Step 8: Generate the next reply from summary + recent turns

Ground the response in compact memory instead of the full transcript.

SELECT GENERATE(
  'You are a travel assistant. Context: ' || summary ||
  ' The user now asks: Suggest a vegetarian dinner spot in Kyoto. Answer in one sentence respecting their constraints.') AS reply
FROM recipe_chat_summary WHERE session_id = 's1';

Cleanup (Optional)

DROP TABLE IF EXISTS recipe_chat_turns;
DROP TABLE IF EXISTS recipe_chat_summary;
DROP TABLE IF EXISTS recipe_chat_transcript;
DROP TABLE IF EXISTS recipe_chat_recent;

Expected Outcomes

Step 5 surfaces the vegetarian / rail / motion-sickness turns for a constraints query — relevant history without the whole log.
Step 6 writes a 2–3 sentence rolling summary that captures the budget, cities, and preferences.
Step 7 shows the cheap context payload: summary + last 2 turns, far fewer tokens than the full transcript.
Step 8 answers with a vegetarian Kyoto suggestion grounded in the summary — full memory, tiny context.

You now have a chatbot that remembers everything, recalls by meaning, and never blows the context window.

Use it from your agent (framework-agnostic — this is the whole point)

Conversation memory is just two tables + the same data ops, so any chatbot uses it with no framework lock-in:

REST / SDK — POST /v1/query/execute (any language), or @synapcores/sdk client.executeQuery(...). Your bot appends each turn (Step 3), refreshes the rolling summary every N turns (Step 6), and assembles summary + recent turns (Step 7) as the model's context.
MCP (native, on by default) — point any MCP client (Claude Code, Cursor, a custom loop, a voice runtime) at ws://<your-instance>/mcp?token=<jwt> (JWT from one POST /v1/auth/login → access_token). The execute tool appends turns and updates the summary; the query tool recalls relevant history — conversation memory becomes tool calls, no SDK required.
Any framework — OpenClaw, LangChain's memory classes, LlamaIndex, a custom loop, or a voice agent all read and write the same turn + summary tables. The database is the brain; the framework is swappable.

About the embedding model — name and how to swap

VECTOR(384) in Step 1 is sized for the default Community Edition embedder, MiniLM (all-MiniLM-L6-v2) — 384-dim, fast, good enough for conversational recall at chatbot scale.

That's a deliberate floor, not a ceiling. For nuanced production retrieval (long help-desk conversations, technical-domain bots, multi-language workloads), upgrade to a 768-dim model:

Local / Ollama: nomic-embed-text (768-dim), mxbai-embed-large (1024-dim)
OpenAI: text-embedding-3-small (1536-dim) or text-embedding-3-large (3072-dim)
Self-hosted: bge-base-en-v1.5 (768-dim), bge-large-en-v1.5 (1024-dim)

To swap, change two things: the VECTOR(384) dimension in Step 1 to match the new model, and the embedder config in gateway.toml. The full swap procedure — including the docker-volume model cache trick — is in the GGUF / Ollama embedding-model swap guide.

Re-embedding existing turns after a swap is one statement: UPDATE recipe_chat_turns SET embedding = EMBED(content). The schema doesn't change beyond the column dimension.

Retention & PII (the part most chatbot examples skip)

Storing every turn forever with no policy is the wrong default for any real product. Two things to wire in before this leaves dev:

1. Time-based retention. Turns older than N days disappear automatically:

-- Run as a scheduled job (cron, AGENT_RUN trigger, or your orchestration of choice).
DELETE FROM recipe_chat_turns
WHERE created_at < CURRENT_TIMESTAMP - INTERVAL 90 DAY;
-- Embeddings live in the same row, so they're purged in the same statement.

2. Right-to-erasure (GDPR Article 17). When a user requests deletion, you must purge every artifact derived from their data — raw turns, embeddings, and the summary that absorbed them. The full path:

-- Step 1: drop the raw turns + their embeddings (same row, one DELETE).
DELETE FROM recipe_chat_turns WHERE session_id = :session_id;

-- Step 2: drop the rolling summary that compressed those turns.
DELETE FROM recipe_chat_summary WHERE session_id = :session_id;

-- Step 3: drop any materialized intermediates from this recipe.
DELETE FROM recipe_chat_transcript WHERE session_id = :session_id;
DELETE FROM recipe_chat_recent
WHERE turn_id NOT IN (SELECT turn_id FROM recipe_chat_turns);

Because embedding is a column on the same row as content, you don't have the cross-store deletion-propagation problem you'd have with a separate vector database. One row, one place, one delete.

3. Don't store secrets you can't redact. Default to redacting common PII before INSERT — emails, phone numbers, credit-card numbers — at the application layer or via a SQL-side regex_replace. Embeddings encode the content; once embedded, the PII is durable in vector form too.

A production-grade variant of this recipe with retention triggers + a redaction step is on the roadmap; if you need it sooner, sign up for the EU-variant note on the Free weekly ML course (Kit form tag: gdpr_retention_pattern).

Key Concepts Learned

Store the full transcript for recall, but feed the model a rolling summary + the last few turns — not the whole log.
COSINE_SIMILARITY recalls old turns by meaning regardless of how far back they are.
GENERATE() over the transcript maintains a compact summary that controls context-window cost.
Because it's plain data ops (SQL / REST / MCP), conversation memory works for any chatbot — the agent-agnostic backend pattern this cluster builds on.

Conversation Memory + Rolling Summary for a Chatbot