Conversation Memory + Rolling Summary for a Chatbot

Store a chatbot's full conversation history and keep a rolling summary so it never blows the context window — turn buffer, semantic recall, and auto-summary in one SQL database. Works with Claude Code, OpenClaw, LangChain, or a voice agent.

All recipes· agents· 14 minutesintermediateen
Instance: localhost:8080

Opens your running SynapCores (Conversation Memory + Rolling Summary for a Chatbot will be staged for a preview — nothing runs until you click Run). No instance yet? Install free in ~30s.

Share

Objective

A chatbot that re-sends the whole transcript every turn eventually overflows the context window and pays for it on every call. The fix is two-layered memory: keep the full history for semantic recall, but feed the model a compact rolling summary of older turns plus the last few messages verbatim. Here you'll build both in one table — store every turn, recall old turns by meaning, and maintain a running summary with GENERATE(). The same store works from any framework or a voice agent — see Use it from your agent at the end.

Step 1: Create the conversation turn store

One row per message, embedded for recall, tagged by session and role.

CREATE TABLE IF NOT EXISTS recipe_chat_turns (
  turn_id     INTEGER PRIMARY KEY,
  session_id  TEXT,
  role        TEXT,                                  -- 'user' or 'assistant'
  content     TEXT,
  created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  embedding   VECTOR(384)
);

Step 2: Create the rolling summary store

One summary row per session — the compressed memory of everything before the recent window.

CREATE TABLE IF NOT EXISTS recipe_chat_summary (
  session_id  TEXT PRIMARY KEY,
  summary     TEXT,
  updated_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Step 3: Record a conversation

A realistic multi-turn chat that's already too long to resend in full on every call.

INSERT INTO recipe_chat_turns (turn_id, session_id, role, content) VALUES
 (1,'s1','user','Hi, I need help planning a 10-day trip to Japan in April.'),
 (2,'s1','assistant','Great! April is cherry-blossom season. Which cities interest you?'),
 (3,'s1','user','Tokyo and Kyoto, and I want to keep it under $3000.'),
 (4,'s1','assistant','Got it — budget $3000, Tokyo and Kyoto. Any food preferences?'),
 (5,'s1','user','I''m vegetarian, and I love trains over flights.'),
 (6,'s1','assistant','Noted: vegetarian, prefers rail. I''ll plan a JR Pass route.'),
 (7,'s1','user','Also I get motion sick, so nothing too winding.'),
 (8,'s1','assistant','Understood — I''ll avoid mountain bus routes.');

Step 4: Embed the turns

Embed every message so older turns are recallable by meaning, not just by position.

UPDATE recipe_chat_turns SET embedding = EMBED(content);

Step 5: Recall the most relevant past turns for the next message

When the user asks something new, pull the few past turns that matter — not the whole transcript.

SELECT turn_id, role, content,
       COSINE_SIMILARITY(embedding, EMBED('what are the user''s dietary and travel constraints?')) AS relevance
FROM recipe_chat_turns
WHERE session_id = 's1'
ORDER BY relevance DESC
LIMIT 3;

Step 6: Build a rolling summary of the conversation

Compress the whole session into a few sentences the model can carry cheaply on every turn. First flatten the transcript into one context row, then generate the summary from that row and store it.

CREATE TABLE IF NOT EXISTS recipe_chat_transcript (session_id TEXT PRIMARY KEY, transcript TEXT);
INSERT INTO recipe_chat_transcript (session_id, transcript)
SELECT 's1', GROUP_CONCAT(role || ': ' || content, ' | ')
FROM recipe_chat_turns WHERE session_id = 's1';
INSERT INTO recipe_chat_summary (session_id, summary)
SELECT session_id,
       GENERATE('Summarize this chat as durable user context in 2-3 sentences (preferences, constraints, goal): ' || transcript)
FROM recipe_chat_transcript;

Step 7: Assemble the cheap context for the next reply

Combine the rolling summary with only the last 2 verbatim turns — the model gets full context at a fraction of the tokens. Materialize the recent window into its own table (projecting turn_id so we can order by it), then read both back.

CREATE TABLE IF NOT EXISTS recipe_chat_recent (turn_id INTEGER PRIMARY KEY, role TEXT, content TEXT);
INSERT INTO recipe_chat_recent (turn_id, role, content)
SELECT turn_id, role, content FROM recipe_chat_turns
WHERE session_id = 's1' ORDER BY turn_id DESC LIMIT 2;
SELECT
  (SELECT summary FROM recipe_chat_summary WHERE session_id='s1') AS rolling_summary,
  (SELECT GROUP_CONCAT(role || ': ' || content, ' | ') FROM recipe_chat_recent) AS recent_turns;

Step 8: Generate the next reply from summary + recent turns

Ground the response in compact memory instead of the full transcript.

SELECT GENERATE(
  'You are a travel assistant. Context: ' || summary ||
  ' The user now asks: Suggest a vegetarian dinner spot in Kyoto. Answer in one sentence respecting their constraints.') AS reply
FROM recipe_chat_summary WHERE session_id = 's1';

Cleanup (Optional)

DROP TABLE IF EXISTS recipe_chat_turns;
DROP TABLE IF EXISTS recipe_chat_summary;
DROP TABLE IF EXISTS recipe_chat_transcript;
DROP TABLE IF EXISTS recipe_chat_recent;

Expected Outcomes

  • Step 5 surfaces the vegetarian / rail / motion-sickness turns for a constraints query — relevant history without the whole log.
  • Step 6 writes a 2–3 sentence rolling summary that captures the budget, cities, and preferences.
  • Step 7 shows the cheap context payload: summary + last 2 turns, far fewer tokens than the full transcript.
  • Step 8 answers with a vegetarian Kyoto suggestion grounded in the summary — full memory, tiny context.

You now have a chatbot that remembers everything, recalls by meaning, and never blows the context window.

Use it from your agent (framework-agnostic — this is the whole point)

Conversation memory is just two tables + the same data ops, so any chatbot uses it with no framework lock-in:

  • REST / SDKPOST /v1/query/execute (any language), or @synapcores/sdk client.executeQuery(...). Your bot appends each turn (Step 3), refreshes the rolling summary every N turns (Step 6), and assembles summary + recent turns (Step 7) as the model's context.
  • MCP (native, on by default) — point any MCP client (Claude Code, Cursor, a custom loop, a voice runtime) at ws://<your-instance>/mcp?token=<jwt> (JWT from one POST /v1/auth/loginaccess_token). The execute tool appends turns and updates the summary; the query tool recalls relevant history — conversation memory becomes tool calls, no SDK required.
  • Any framework — OpenClaw, LangChain's memory classes, LlamaIndex, a custom loop, or a voice agent all read and write the same turn + summary tables. The database is the brain; the framework is swappable.

Key Concepts Learned

  • Store the full transcript for recall, but feed the model a rolling summary + the last few turns — not the whole log.
  • COSINE_SIMILARITY recalls old turns by meaning regardless of how far back they are.
  • GENERATE() over the transcript maintains a compact summary that controls context-window cost.
  • Because it's plain data ops (SQL / REST / MCP), conversation memory works for any chatbot — the agent-agnostic backend pattern this cluster builds on.

Tags

chatbotconversation-memoryrolling-summaryvectorembeddingscontext-windowmcp

Run this on your own machine

Install SynapCores Community Edition free, paste the SQL or Cypher above into the bundled web UI, and watch it run.

Download Free CE