Objective
An agent that never learns repeats its mistakes. Every thumbs-up/down, resolved/escalated, or accepted/rejected signal is training data — if you keep it. Here you'll close the loop: log interactions with their outcomes, train an AutoML model on what made past replies succeed, then score a new draft reply's likelihood of satisfying the user before sending it. The agent gets measurably better as feedback accumulates — no ML pipeline, no separate model server. The same pattern works from any framework or a voice agent — see Use it from your agent at the end.
Step 1: Create the interaction-feedback store
Each row is a past interaction with the features the agent controlled and the outcome it earned.
CREATE TABLE IF NOT EXISTS recipe_agent_feedback (
interaction_id INTEGER PRIMARY KEY,
reply_length INTEGER, -- characters in the agent's reply
used_context INTEGER, -- 1 if the reply cited retrieved context, else 0
response_seconds DOUBLE, -- how long the agent took
asked_clarify INTEGER, -- 1 if it asked a clarifying question first
tone_score DOUBLE, -- 0..1 warmth/politeness heuristic
satisfied INTEGER -- label: 1 = thumbs-up, 0 = thumbs-down
);
Step 2: Log historical interactions with outcomes
Realistic feedback: concise, context-grounded, prompt replies tend to satisfy; long, ungrounded, slow ones don't.
INSERT INTO recipe_agent_feedback
(interaction_id, reply_length, used_context, response_seconds, asked_clarify, tone_score, satisfied) VALUES
(1, 180,1,1.2,0,0.90,1),(2, 920,0,6.5,0,0.40,0),(3, 240,1,1.8,1,0.85,1),
(4, 760,0,5.1,0,0.55,0),(5, 150,1,0.9,0,0.92,1),(6,1100,0,7.8,0,0.30,0),
(7, 300,1,2.1,1,0.80,1),(8, 640,0,4.4,0,0.50,0),(9, 210,1,1.4,0,0.88,1),
(10,880,0,6.0,0,0.45,0),(11,260,1,1.9,1,0.83,1),(12,700,0,5.5,0,0.48,0),
(13,170,1,1.1,0,0.91,1),(14,950,0,7.0,0,0.35,0),(15,230,1,1.6,1,0.86,1),
(16,820,0,5.9,0,0.52,0),(17,190,1,1.3,0,0.89,1),(18,990,0,7.2,0,0.33,0),
(19,280,1,2.0,1,0.82,1),(20,680,0,4.9,0,0.49,0);
Step 3: Train a model on what makes a reply succeed
AutoML learns the pattern between the agent's behavior and user satisfaction — feature engineering and model selection included.
CREATE EXPERIMENT agent_satisfaction_exp AS
SELECT reply_length, used_context, response_seconds, asked_clarify, tone_score,
satisfied AS target
FROM recipe_agent_feedback
WITH (
task_type = 'binary_classification',
target_column = 'target',
optimization_metric = 'auc',
algorithms = ['logistic_regression', 'random_forest', 'gradient_boosting'],
validation_strategy = 'stratified_kfold',
n_folds = 3,
max_trials = 15
);
Step 4: Deploy the trained model
Promote the best trial to a named model the agent can call.
DEPLOY MODEL agent_satisfaction FROM EXPERIMENT agent_satisfaction_exp;
Step 5: Score a candidate draft before sending it
Predict whether a proposed reply will satisfy the user, given its features — the agent's self-assessment.
SELECT AUTOML.PREDICT('agent_satisfaction', 210, 1, 1.5, 0, 0.88) AS predicted_satisfaction;
Step 6: Compare two draft strategies and pick the better one
Score a concise, context-grounded draft against a long, ungrounded one and let the model choose.
SELECT 'concise+grounded' AS strategy, AUTOML.PREDICT('agent_satisfaction', 200, 1, 1.4, 0, 0.90) AS score
UNION ALL
SELECT 'long+ungrounded', AUTOML.PREDICT('agent_satisfaction', 950, 0, 6.8, 0, 0.40) AS score
ORDER BY score DESC;
Step 7: Score the whole backlog and find weak interactions to learn from
Apply the model across history to surface the interaction patterns the agent should avoid repeating.
SELECT interaction_id, satisfied,
AUTOML.PREDICT('agent_satisfaction', reply_length, used_context, response_seconds, asked_clarify, tone_score) AS model_score
FROM recipe_agent_feedback
ORDER BY model_score ASC
LIMIT 5;
Cleanup (Optional)
DROP TABLE IF EXISTS recipe_agent_feedback;
Expected Outcomes
- Step 3–4 train and deploy a model that learns concise + context-grounded + prompt replies earn thumbs-up.
- Step 5 scores a concise, grounded draft with a high satisfaction probability.
- Step 6 ranks the concise+grounded strategy above the long+ungrounded one — the agent picks the better reply before sending.
- Step 7 surfaces the lowest-scoring historical interactions: exactly the behaviors the agent should stop repeating.
You now have a self-improving agent: it learns from feedback and scores its own drafts so it sends the reply most likely to satisfy.
Use it from your agent (framework-agnostic — this is the whole point)
Self-improvement is just log feedback → train → score, so any agent uses it with no framework lock-in:
- REST / SDK —
POST /v1/query/execute(any language), or@synapcores/sdkclient.executeQuery(...). Your agent appends each interaction's features + outcome (Step 2), retrains on a schedule (Steps 3–4), and scores candidate drafts in real time (Step 5). - MCP (native, on by default) — point any MCP client (Claude Code, Cursor, a custom loop, a voice runtime) at
ws://<your-instance>/mcp?token=<jwt>(JWT from onePOST /v1/auth/login→access_token). Theexecutetool logs feedback and trains; thequerytool scores a draft — the learning loop runs as tool calls. - Any framework — OpenClaw, LangChain / DSPy optimizers, a custom loop, or a voice agent all log to and score against the same model. The database is the brain; the framework is swappable.
Key Concepts Learned
- Feedback is training data — log every outcome with the features the agent controlled.
CREATE EXPERIMENT→DEPLOY MODEL→AUTOML.PREDICTtrains and serves the model entirely in SQL.- Scoring a draft before sending it turns feedback into a forward-looking quality gate.
- Because it's plain data ops (AutoML + SQL / REST / MCP), self-improvement works for any agent — the agent-agnostic backend pattern this cluster builds on.