A Self-Improving Agent (learn from feedback)

Build an AI agent that learns from thumbs-up/down feedback — train a model on past interactions to predict whether a draft reply will satisfy the user, all in one SQL database. Works with Claude Code, OpenClaw, LangChain, or a voice agent.

All recipes· agents· 16 minutesadvancedsql
Instance: localhost:8080

Opens your running SynapCores (A Self-Improving Agent (learn from feedback) will be staged for a preview — nothing runs until you click Run). No instance yet? Install free in ~30s.

Share

Objective

An agent that never learns repeats its mistakes. Every thumbs-up/down, resolved/escalated, or accepted/rejected signal is training data — if you keep it. Here you'll close the loop: log interactions with their outcomes, train an AutoML model on what made past replies succeed, then score a new draft reply's likelihood of satisfying the user before sending it. The agent gets measurably better as feedback accumulates — no ML pipeline, no separate model server. The same pattern works from any framework or a voice agent — see Use it from your agent at the end.

Step 1: Create the interaction-feedback store

Each row is a past interaction with the features the agent controlled and the outcome it earned.

CREATE TABLE IF NOT EXISTS recipe_agent_feedback (
  interaction_id   INTEGER PRIMARY KEY,
  reply_length     INTEGER,                          -- characters in the agent's reply
  used_context     INTEGER,                          -- 1 if the reply cited retrieved context, else 0
  response_seconds DOUBLE,                            -- how long the agent took
  asked_clarify    INTEGER,                          -- 1 if it asked a clarifying question first
  tone_score       DOUBLE,                            -- 0..1 warmth/politeness heuristic
  satisfied        INTEGER                            -- label: 1 = thumbs-up, 0 = thumbs-down
);

Step 2: Log historical interactions with outcomes

Realistic feedback: concise, context-grounded, prompt replies tend to satisfy; long, ungrounded, slow ones don't.

INSERT INTO recipe_agent_feedback
 (interaction_id, reply_length, used_context, response_seconds, asked_clarify, tone_score, satisfied) VALUES
 (1, 180,1,1.2,0,0.90,1),(2, 920,0,6.5,0,0.40,0),(3, 240,1,1.8,1,0.85,1),
 (4, 760,0,5.1,0,0.55,0),(5, 150,1,0.9,0,0.92,1),(6,1100,0,7.8,0,0.30,0),
 (7, 300,1,2.1,1,0.80,1),(8, 640,0,4.4,0,0.50,0),(9, 210,1,1.4,0,0.88,1),
 (10,880,0,6.0,0,0.45,0),(11,260,1,1.9,1,0.83,1),(12,700,0,5.5,0,0.48,0),
 (13,170,1,1.1,0,0.91,1),(14,950,0,7.0,0,0.35,0),(15,230,1,1.6,1,0.86,1),
 (16,820,0,5.9,0,0.52,0),(17,190,1,1.3,0,0.89,1),(18,990,0,7.2,0,0.33,0),
 (19,280,1,2.0,1,0.82,1),(20,680,0,4.9,0,0.49,0);

Step 3: Train a model on what makes a reply succeed

AutoML learns the pattern between the agent's behavior and user satisfaction — feature engineering and model selection included.

CREATE EXPERIMENT agent_satisfaction_exp AS
SELECT reply_length, used_context, response_seconds, asked_clarify, tone_score,
       satisfied AS target
FROM recipe_agent_feedback
WITH (
  task_type = 'binary_classification',
  target_column = 'target',
  optimization_metric = 'auc',
  algorithms = ['logistic_regression', 'random_forest', 'gradient_boosting'],
  validation_strategy = 'stratified_kfold',
  n_folds = 3,
  max_trials = 15
);

Step 4: Deploy the trained model

Promote the best trial to a named model the agent can call.

DEPLOY MODEL agent_satisfaction FROM EXPERIMENT agent_satisfaction_exp;

Step 5: Score a candidate draft before sending it

Predict whether a proposed reply will satisfy the user, given its features — the agent's self-assessment.

SELECT AUTOML.PREDICT('agent_satisfaction', 210, 1, 1.5, 0, 0.88) AS predicted_satisfaction;

Step 6: Compare two draft strategies and pick the better one

Score a concise, context-grounded draft against a long, ungrounded one and let the model choose.

SELECT 'concise+grounded' AS strategy, AUTOML.PREDICT('agent_satisfaction', 200, 1, 1.4, 0, 0.90) AS score
UNION ALL
SELECT 'long+ungrounded',  AUTOML.PREDICT('agent_satisfaction', 950, 0, 6.8, 0, 0.40) AS score
ORDER BY score DESC;

Step 7: Score the whole backlog and find weak interactions to learn from

Apply the model across history to surface the interaction patterns the agent should avoid repeating.

SELECT interaction_id, satisfied,
       AUTOML.PREDICT('agent_satisfaction', reply_length, used_context, response_seconds, asked_clarify, tone_score) AS model_score
FROM recipe_agent_feedback
ORDER BY model_score ASC
LIMIT 5;

Cleanup (Optional)

DROP TABLE IF EXISTS recipe_agent_feedback;

Expected Outcomes

  • Step 3–4 train and deploy a model that learns concise + context-grounded + prompt replies earn thumbs-up.
  • Step 5 scores a concise, grounded draft with a high satisfaction probability.
  • Step 6 ranks the concise+grounded strategy above the long+ungrounded one — the agent picks the better reply before sending.
  • Step 7 surfaces the lowest-scoring historical interactions: exactly the behaviors the agent should stop repeating.

You now have a self-improving agent: it learns from feedback and scores its own drafts so it sends the reply most likely to satisfy.

Use it from your agent (framework-agnostic — this is the whole point)

Self-improvement is just log feedback → train → score, so any agent uses it with no framework lock-in:

  • REST / SDKPOST /v1/query/execute (any language), or @synapcores/sdk client.executeQuery(...). Your agent appends each interaction's features + outcome (Step 2), retrains on a schedule (Steps 3–4), and scores candidate drafts in real time (Step 5).
  • MCP (native, on by default) — point any MCP client (Claude Code, Cursor, a custom loop, a voice runtime) at ws://<your-instance>/mcp?token=<jwt> (JWT from one POST /v1/auth/loginaccess_token). The execute tool logs feedback and trains; the query tool scores a draft — the learning loop runs as tool calls.
  • Any framework — OpenClaw, LangChain / DSPy optimizers, a custom loop, or a voice agent all log to and score against the same model. The database is the brain; the framework is swappable.

Key Concepts Learned

  • Feedback is training data — log every outcome with the features the agent controlled.
  • CREATE EXPERIMENTDEPLOY MODELAUTOML.PREDICT trains and serves the model entirely in SQL.
  • Scoring a draft before sending it turns feedback into a forward-looking quality gate.
  • Because it's plain data ops (AutoML + SQL / REST / MCP), self-improvement works for any agent — the agent-agnostic backend pattern this cluster builds on.

Tags

ai-agentself-improvingfeedbackautomlmachine-learningvectormcp

Run this on your own machine

Install SynapCores Community Edition free, paste the SQL or Cypher above into the bundled web UI, and watch it run.

Download Free CE