How to Swap the GGUF Model in SynapCores (Native + Docker)

Published on May 31, 2026

How to Swap the GGUF Model in SynapCores (Native + Docker)

Tested against SynapCores CE v1.7.0.1-ce — the currently-shipped release on Docker Hub: synapcores/community:v1.7.0.1-ce. Every command in this article was executed verbatim against that build.

SynapCores CE ships with a small chat model baked in (Llama-3.2-1B-Instruct, ~700 MB) so the database has a working in-DB LLM the moment you start it — no API keys, no Ollama dependency, no setup. But "small chat model" isn't always what you want. Maybe you need a code-specialized model for SQL generation, a reasoning model for agent loops, or a larger general model for richer GENERATE() output.

This guide shows you the actual swap procedure for both deployment shapes — bare-metal install and Docker container — and ends with a one-line SQL verification that proves the new model is the one the engine is calling.


How model resolution works in SynapCores CE

Three pieces drive which GGUF file the engine loads.

        gateway.toml                       filesystem
   ┌──────────────────────┐           ┌────────────────────────┐
   │ [query.ai_service]   │           │ <models_dir>/          │
   │ provider = "native"  │  resolve  │   *.gguf               │
   │ model    = "NAME"  ─────────────▶│   NAME.gguf            │
   └──────────────────────┘           │   NAME.Q4_K_M.gguf     │
                                      │   NAME-q4_k_m.gguf     │
            env vars                  │   NAME.Q5_K_M.gguf     │
   ┌──────────────────────┐           │   NAME.Q8_0.gguf       │
   │ AIDB_MODELS_DIR      │  picks    └────────────────────────┘
   │   = /path/to/models  │  ─────────┘
   └──────────────────────┘
  1. [query.ai_service].provider = "native" tells the gateway to use the embedded llama.cpp runtime instead of Ollama / OpenAI / Anthropic.
  2. [query.ai_service].model = "NAME" is the filename of the GGUF without the .gguf extension. The runtime tries the bare name first, then common quantization suffixes (.Q4_K_M, -q4_k_m, .Q5_K_M, .Q8_0, etc.).
  3. AIDB_MODELS_DIR is the directory that gets searched. If unset, the binary falls back to ./models/text relative to the working directory.

Supported architectures (anything llama.cpp upstream supports):

llama / llama2 / llama3.x      mistral / mixtral      gemma / gemma2 / gemma3
phi2 / phi3 / phi3.5           qwen / qwen2 / qwen2.5 / qwen3
deepseek (v2 / v3)             yi      falcon         granite (incl. 3.x / MoE)
starcoder / starcoder2         codellama              tinyllama       stablelm

So a swap is: drop a new GGUF in the models directory, change the model = value, restart the gateway. That's the whole article. The rest is the exact commands for both deployment shapes plus the verification SQL.


Where to get GGUF files

huggingface.co. Search for "<model> GGUF". Two reliable mirror accounts:

  • bartowski — fresh quantizations of new releases, usually within 24 hours. Multiple quant levels per model.
  • TheBloke — large historical archive; older but exhaustive coverage of pre-2025 models.

Pick a quantization that fits your RAM budget:

Quant level Quality Size for a 7B Recommended RAM
Q4_K_M Best balance ~4.5 GB 8 GB
Q5_K_M Slightly better ~5.5 GB 10 GB
Q8_0 Near-FP16 ~7.5 GB 12 GB
FP16 Full quality ~14 GB 24 GB

For a 1B model, divide those by ~6. For a 3B by ~2.5. Q4_K_M is the right default: under 1% accuracy loss versus FP16 in most benchmarks, three to four times smaller.


Swap on a native install

Native install puts everything under a single prefix:

/opt/synapcores/              ← system install  (sudo install)
~/.synapcores/                ← user install    (no sudo)
   ├── synapcores             ← the binary
   ├── models/
   │   └── text/              ← *.gguf live here
   ├── etc/gateway.toml       ← config
   └── aidb_data/             ← RocksDB data

Step 1 — Download the GGUF

Example: swapping the default chat model for Phi-3-mini-4k-instruct (better at structured output than Llama-3.2-1B, still small enough for a 16 GB laptop).

# Pick a quantization level; Q4_K_M for the right size/quality balance
curl -L -o /opt/synapcores/models/text/Phi-3-mini-4k-instruct-q4.gguf \
  "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"

# Sanity check
ls -lh /opt/synapcores/models/text/Phi-3-mini-4k-instruct-q4.gguf
# → -rw-r--r-- ... ~2.3G

If you run as a non-root user, drop the file in ~/.synapcores/models/text/ instead.

Step 2 — Update gateway.toml

Find your active config (usually /etc/synapcores/gateway.toml for system installs, ~/.synapcores/etc/gateway.toml for user installs):

sudo $EDITOR /etc/synapcores/gateway.toml

Change the model line in [query.ai_service]:

[query.ai_service]
provider        = "native"
# BEFORE: model = "llama-3.2-1b-instruct-q4_k_m"
model           = "Phi-3-mini-4k-instruct-q4"
embedding_model = "minilm"

The filename stem matches: Phi-3-mini-4k-instruct-q4.gguf on disk → model = "Phi-3-mini-4k-instruct-q4" in config. The runtime resolves <stem>.gguf first, so this is unambiguous.

Step 3 — Restart the gateway

If you installed via install-ce.sh, there's a systemd unit:

sudo systemctl restart synapcores
sudo systemctl status synapcores

For a foreground run, just kill the process and re-launch:

pkill synapcores
/opt/synapcores/synapcores --config /etc/synapcores/gateway.toml --accept-license &

Step 4 — Verify the new model is loaded

The cleanest verification is to fire a SELECT GENERATE() and look at the response style. Phi-3 is noticeably chattier than Llama-3.2-1B, so the prose itself proves the swap.

# Login (replace with your actual admin password from the boot log)
TOKEN=$(curl -fsS -X POST http://127.0.0.1:8080/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"YOUR_ADMIN_PASSWORD"}' \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])")

# Drive the embedded LLM via SQL
curl -sS -X POST http://127.0.0.1:8080/v1/query/execute \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"sql":"SELECT GENERATE('"'"'Say one word: HELLO'"'"') AS reply","database":"default"}'

Expected response shape (Phi-3 leaning chatty):

{
  "data": {
    "columns": [{"name":"reply","data_type":"TEXT","nullable":true}],
    "rows": [["\n\nResponse: Hello! How can I assist you today? ..."]],
    "rows_affected": 1,
    "execution_time_ms": 9208
  }
}

If you see Default model 'X' not found in configuration errors, the config-name doesn't match the file-stem. Re-check Step 2.


Swap on the Docker image

The Docker image ships a working Llama-3.2-1B-Instruct at /opt/synapcores/models/text/llama-3.2-1b-instruct-q4_k_m.gguf baked into the layer. You don't rebuild the image — you bind-mount your replacement GGUF and a custom config over it.

host:                                container:
  ./Phi-3-mini-4k-instruct-q4.gguf   /opt/synapcores/models/text/Phi-3...
  ./gateway-phi3.toml          ───▶  /etc/synapcores/gateway.toml

Step 1 — Get the default config out of the image

mkdir -p ./gguf-swap && cd ./gguf-swap

docker run --rm --entrypoint cat \
  synapcores/community:v1.7.0.1-ce \
  /etc/synapcores/gateway.toml > gateway-default.toml

Step 2 — Stage your model + swapped config

# Pull the GGUF you want to use
curl -L -o ./Phi-3-mini-4k-instruct-q4.gguf \
  "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"

# Make a copy of the default config with the model name swapped
sed 's|model[[:space:]]*=[[:space:]]*"llama-3.2-1b-instruct-q4_k_m"|model           = "Phi-3-mini-4k-instruct-q4"|' \
  gateway-default.toml > gateway-phi3.toml

# Confirm the change
grep -A 4 '\[query.ai_service\]' gateway-phi3.toml

You should see:

[query.ai_service]
provider        = "native"
model           = "Phi-3-mini-4k-instruct-q4"
embedding_model = "minilm"

Step 3 — Run the container with both files mounted

docker run -d --name synapcores-phi3 \
  -p 8080:8080 \
  -e AIDB_ACCEPT_LICENSE=true \
  -v "$PWD/Phi-3-mini-4k-instruct-q4.gguf":/opt/synapcores/models/text/Phi-3-mini-4k-instruct-q4.gguf:ro \
  -v "$PWD/gateway-phi3.toml":/etc/synapcores/gateway.toml:ro \
  synapcores/community:v1.7.0.1-ce

Two -v flags: one for the GGUF, one for the config. Both read-only — the container never needs to write to either.

Step 4 — Grab the admin password + verify

# First-boot admin password is in the docker logs
PASS=$(docker logs synapcores-phi3 2>&1 \
  | grep -oE 'password: [A-Za-z0-9-]+' | head -1 | awk '{print $2}')
echo "admin password: $PASS"

# Login
TOKEN=$(curl -fsS -X POST http://127.0.0.1:8080/v1/auth/login \
  -H "Content-Type: application/json" \
  -d "{\"username\":\"admin\",\"password\":\"$PASS\"}" \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])")

# Drive the swapped model via SQL
curl -sS --max-time 60 -X POST http://127.0.0.1:8080/v1/query/execute \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"sql":"SELECT GENERATE('"'"'Say one word: HELLO'"'"') AS reply","database":"default"}'

The actual response from the validated swap on v1.7.0.1-ce:

{
  "data": {
    "columns": [{"name":"reply","data_type":"TEXT","nullable":true}],
    "rows": [["\n\nResponse: Hello! How can I assist you today? Whether you have a question, need guidance, or just want to engage in a friendly conversation, I'm here to help. Let's get started!"]],
    "rows_affected": 1,
    "execution_time_ms": 9208
  }
}

The chatty multi-sentence response is Phi-3-mini's signature style. Llama-3.2-1B replies in 5–10 tokens for the same prompt. The output itself is your proof.

Optional: persist across container restarts

If you don't want to retype -v flags every time, use a tiny docker-compose.yml:

services:
  synapcores:
    image: synapcores/community:v1.7.0.1-ce
    ports: ["8080:8080"]
    environment:
      - AIDB_ACCEPT_LICENSE=true
    volumes:
      - ./Phi-3-mini-4k-instruct-q4.gguf:/opt/synapcores/models/text/Phi-3-mini-4k-instruct-q4.gguf:ro
      - ./gateway-phi3.toml:/etc/synapcores/gateway.toml:ro
      - synapcores-data:/var/lib/synapcores
volumes:
  synapcores-data:
docker compose up -d

Which model should you pick?

Three honest recommendations, calibrated against the actual SynapCores workload (in-DB SQL GENERATE(), NL2SQL, embeddings, recipe execution):

For chat + general Q&A — Llama-3.2-3B-Instruct (Q4_K_M, ~2 GB)

Triple the parameters of the default 1B; massively better at multi-sentence reasoning, still runs on any 8 GB box. Bartowski mirror: bartowski/Llama-3.2-3B-Instruct-GGUF.

model = "Llama-3.2-3B-Instruct-Q4_K_M"

For SQL + code generation — Qwen2.5-Coder-7B-Instruct (Q4_K_M, ~4.7 GB)

The recipe-builder model the install-ce.sh installer drops automatically on ≥8 GB / ≥10 GB-free-disk boxes. Best in class for SQL up to ~13B. Wins on this size by a wide margin.

model = "Qwen2.5-Coder-7B-Instruct-Q4_K_M"

For agent loops (tool calling + reasoning) — Phi-3-mini-4k-instruct (Q4, ~2.3 GB)

Reasoning-tuned, tool-calling friendly, small enough that you can run it alongside an embedding model on a 16 GB laptop. The example used in this article.

model = "Phi-3-mini-4k-instruct-q4"

Troubleshooting

"ConfigError: Default model 'X' not found in configuration" The model = string doesn't match a .gguf file in AIDB_MODELS_DIR (or /opt/synapcores/models/text/ in Docker). Double- check the filename stem — the resolver tries <name>.gguf first, then common quant suffixes; it does not glob.

Container starts but GENERATE() returns gibberish The architecture isn't in llama.cpp's supported list, or the GGUF is malformed (truncated download). Verify the file size against the huggingface page, and check docker logs for parse errors at startup.

OOM on first inference The model is too large for your RAM. Switch to a smaller quantization (Q4_K_MQ4_0Q3_K_M) or a smaller parameter count (7B3B1B). RAM rule of thumb: model file size × 1.3 for working set.

Multiple models with the same stem The resolver picks the first match in the order .gguf, .Q4_K_M.gguf, -q4_k_m.gguf, .Q4_0.gguf, .Q5_K_M.gguf, .Q8_0.gguf. If you want explicit control, rename the file to a stem that's unique to one quant.


Why this matters

The default chat model is good enough to demo the database. It is not good enough for production reasoning, complex NL2SQL, or agent loops. Most other "AI database" platforms either lock you to a single vendor-hosted model or require you to run a separate inference service (Ollama, vLLM, TGI) alongside the database, then point at it over the network.

SynapCores keeps inference inside the database process. That's why the swap is two files and one restart — there's no second service to coordinate. And because the runtime is llama.cpp upstream, the moment a new model architecture lands in the project, it works here.


Try it

Pull the image, swap the model, run the verification — that's the whole post. If you don't have SynapCores installed yet:

docker pull synapcores/community:v1.7.0.1-ce
# or for bare metal:
curl -fsSL https://get.synapcores.com | sh

Get SynapCores Community Edition →

Open the SQL reference →