Credit Card Fraud Detection (AutoML)

Tested against SynapCores CE v1.7.0.1-ce (the currently-shipped release on Docker Hub: synapcores/community:v1.7.0.1-ce). Copy each block in order — the numbers in the comments come from this run.

Objective

Catch fraudulent card-not-present transactions the instant they hit the authorization endpoint. The pattern is dead simple: train on past labelled transactions, deploy the model, score every new authorization.

Why this matters: card-network reports put CNP fraud losses at >$10B/year in the US alone, and the median issuer's rule-based system is a stack of hand-written IF amount > X AND hour < Y branches that decay the moment fraud rings adapt. A trained classifier learns the joint distribution across amount, time, geography, and merchant — and updates with one CREATE EXPERIMENT call.

Step 1 — Schema + labelled training set

130 normal authorizations + 20 confirmed fraud (~13% fraud rate, deliberate for realistic class imbalance).

DROP TABLE IF EXISTS fraud_txn;
CREATE TABLE fraud_txn (
    id            INTEGER PRIMARY KEY,
    amount        DOUBLE,
    hour          INTEGER,
    distance_km   DOUBLE,
    merchant_risk DOUBLE,
    is_fraud      INTEGER
);

INSERT INTO fraud_txn VALUES
(1,162.74,9,37.1,0.073,0),(2,41.77,10,33.8,0.268,0),(3,29.04,15,1.6,0.028,0),
(4,64.3,18,1.3,0.06,0),(5,165.27,17,21.0,0.135,0),(6,87.5,11,4.2,0.12,0),
(7,201.0,13,12.0,0.21,0),(8,33.5,12,0.5,0.05,0),(9,128.4,16,8.1,0.18,0),
(10,55.6,14,2.0,0.09,0),(11,99.0,10,18.0,0.22,0),(12,210.0,17,30.0,0.27,0),
(13,77.2,15,6.0,0.11,0),(14,184.0,11,15.0,0.16,0),(15,42.0,9,2.5,0.07,0),
(16,150.0,18,9.0,0.14,0),(17,68.0,13,3.0,0.08,0),(18,230.0,16,40.0,0.29,0),
(19,89.5,12,5.5,0.10,0),(20,175.0,14,11.0,0.20,0),(21,30.2,10,1.0,0.04,0),
(22,118.7,11,7.5,0.13,0),(23,205.0,17,28.0,0.25,0),(24,52.5,15,4.0,0.09,0),
(25,140.0,13,10.0,0.17,0),(26,75.0,9,3.5,0.08,0),(27,195.0,18,22.0,0.24,0),
(28,60.0,12,2.8,0.07,0),(29,108.0,10,6.2,0.13,0),(30,225.0,16,35.0,0.28,0),
(31,82.4,14,5.0,0.11,0),(32,170.0,15,13.0,0.19,0),(33,38.0,9,1.5,0.05,0),
(34,124.0,11,8.0,0.15,0),(35,200.0,17,25.0,0.23,0),(36,49.0,12,3.2,0.08,0),
(37,135.0,13,9.5,0.16,0),(38,72.0,10,4.5,0.09,0),(39,188.0,16,20.0,0.22,0),
(40,57.0,15,2.7,0.07,0),(41,113.0,11,7.0,0.13,0),(42,215.0,18,32.0,0.27,0),
(43,80.5,12,4.8,0.10,0),(44,160.0,14,12.5,0.18,0),(45,44.0,9,2.2,0.06,0),
(46,128.0,17,8.8,0.15,0),(47,193.0,13,17.0,0.21,0),(48,66.0,15,3.4,0.09,0),
(49,144.0,11,11.0,0.17,0),(50,76.0,16,4.0,0.10,0),(51,182.0,12,14.5,0.19,0),
(52,53.0,10,2.9,0.08,0),(53,121.0,18,7.8,0.14,0),(54,207.0,14,26.0,0.24,0),
(55,85.0,13,5.2,0.11,0),(56,155.0,9,11.5,0.18,0),(57,39.5,17,1.8,0.05,0),
(58,131.0,15,9.0,0.16,0),(59,196.0,11,23.0,0.23,0),(60,62.0,12,3.1,0.07,0),
(61,140.5,16,10.5,0.17,0),(62,73.5,18,4.4,0.09,0),(63,189.0,10,19.0,0.22,0),
(64,58.5,14,2.6,0.08,0),(65,115.0,13,7.3,0.13,0),(66,222.0,9,38.0,0.28,0),
(67,79.0,15,4.7,0.10,0),(68,168.0,17,13.5,0.19,0),(69,46.5,11,2.4,0.06,0),
(70,127.0,12,8.5,0.15,0),(71,198.0,16,24.0,0.23,0),(72,69.5,18,3.6,0.09,0),
(73,143.0,10,10.2,0.16,0),(74,77.5,14,4.3,0.10,0),(75,185.0,13,16.0,0.20,0),
(76,55.5,9,2.7,0.07,0),(77,119.0,15,7.6,0.14,0),(78,212.0,11,30.0,0.26,0),
(79,82.0,17,4.9,0.11,0),(80,156.0,12,11.8,0.18,0),(81,41.5,16,1.9,0.05,0),
(82,133.0,18,9.2,0.15,0),(83,201.5,10,25.5,0.23,0),(84,63.5,14,3.0,0.08,0),
(85,138.0,13,10.0,0.17,0),(86,74.0,9,4.1,0.09,0),(87,191.0,15,21.0,0.22,0),
(88,59.0,17,2.8,0.08,0),(89,114.5,11,7.2,0.13,0),(90,219.0,16,34.0,0.27,0),
(91,81.5,12,5.1,0.10,0),(92,163.0,18,12.8,0.19,0),(93,37.5,10,1.7,0.05,0),
(94,125.5,14,8.3,0.15,0),(95,202.0,13,27.0,0.24,0),(96,50.0,9,3.3,0.08,0),
(97,137.0,15,9.7,0.16,0),(98,73.0,17,4.6,0.09,0),(99,187.0,11,18.5,0.21,0),
(100,56.0,12,2.5,0.07,0),(101,112.0,16,6.8,0.13,0),(102,217.0,18,31.0,0.27,0),
(103,83.5,10,5.3,0.11,0),(104,159.0,14,12.2,0.18,0),(105,43.0,13,2.1,0.06,0),
(106,129.0,9,8.7,0.15,0),(107,194.5,15,22.5,0.23,0),(108,67.0,17,3.5,0.09,0),
(109,142.5,11,10.7,0.17,0),(110,76.5,12,4.2,0.10,0),(111,183.0,16,15.0,0.20,0),
(112,54.0,18,2.6,0.08,0),(113,120.0,10,7.7,0.14,0),(114,208.0,14,26.5,0.24,0),
(115,84.5,13,5.4,0.11,0),(116,154.0,9,11.7,0.18,0),(117,40.0,15,1.6,0.05,0),
(118,132.5,17,9.4,0.16,0),(119,197.0,11,23.5,0.23,0),(120,61.0,12,3.0,0.07,0),
(121,141.0,16,10.4,0.17,0),(122,73.0,18,4.0,0.09,0),(123,190.0,10,19.5,0.22,0),
(124,58.0,14,2.8,0.08,0),(125,116.0,13,7.4,0.13,0),(126,221.0,9,37.5,0.28,0),
(127,78.0,15,4.6,0.10,0),(128,167.0,17,13.2,0.19,0),(129,47.0,11,2.3,0.06,0),
(130,126.0,12,8.4,0.15,0),
-- ── confirmed fraud (20 rows: high amount, off-hours, far geography, high merchant_risk)
(131,3450.0,2,2100.0,0.91,1),(132,5200.0,3,4500.0,0.95,1),(133,1820.0,23,800.0,0.78,1),
(134,7800.0,1,6200.0,0.97,1),(135,2950.0,4,1500.0,0.85,1),(136,6100.0,22,5300.0,0.93,1),
(137,4400.0,0,3200.0,0.89,1),(138,8900.0,2,7100.0,0.98,1),(139,2200.0,3,1100.0,0.82,1),
(140,5500.0,23,4000.0,0.92,1),(141,3700.0,1,2400.0,0.87,1),(142,9100.0,4,8500.0,0.99,1),
(143,2050.0,22,950.0,0.80,1),(144,6500.0,0,5600.0,0.94,1),(145,4750.0,3,3500.0,0.90,1),
(146,8200.0,2,7400.0,0.96,1),(147,3100.0,23,1700.0,0.84,1),(148,5950.0,1,4900.0,0.92,1),
(149,2400.0,4,1250.0,0.81,1),(150,7300.0,22,6800.0,0.95,1);

-- ✅ 150 rows: 130 normal, 20 confirmed fraud
SELECT COUNT(*) AS total, SUM(is_fraud) AS confirmed_fraud FROM fraud_txn;

Step 2 — Train + deploy the classifier

CREATE EXPERIMENT fraud_clf AS
SELECT amount, hour, distance_km, merchant_risk, is_fraud AS target
FROM fraud_txn
WITH (
    task_type = 'binary_classification',
    target_column = 'target',
    optimization_metric = 'auc',
    max_trials = 8,
    time_budget_seconds = 120,
    algorithms = ['logistic_regression', 'random_forest', 'gradient_boosting'],
    validation_strategy = 'kfold',
    n_folds = 3,
    feature_engineering = false,
    hyperparameter_strategy = 'random'
);

DEPLOY MODEL fraud_predictor FROM EXPERIMENT fraud_clf;

Expected: best_score = 1.0 (the two classes are cleanly separable on these features — that's the point of using a tractable demo dataset).

Step 3 — Score new authorizations

-- A perfectly normal lunchtime purchase
SELECT AUTOML.PREDICT('fraud_predictor', 45.00, 14, 5.0, 0.10) AS risk;
-- → 0.083  (LOW)

-- A high-amount, 2am, faraway, high-risk-merchant authorization
SELECT AUTOML.PREDICT('fraud_predictor', 5500.00, 2, 4500.0, 0.92) AS risk;
-- → 0.955  (HIGH — block)

Step 4 — Real-time scoring across the table

-- ORDER BY on a PREDICT()-derived alias must be wrapped in a CTE on
-- v1.7.0.2.1-ce so the planner doesn't bind `risk` as a feature.
-- Tracked as engine #232; native ORDER BY support lands in v1.7.0.3.
WITH scored AS (
    SELECT
        id,
        amount,
        hour,
        distance_km,
        merchant_risk,
        is_fraud AS actual,
        AUTOML.PREDICT('fraud_predictor', amount, hour, distance_km, merchant_risk) AS risk
    FROM fraud_txn
)
SELECT * FROM scored
ORDER BY risk DESC
LIMIT 10;

The top-10-by-risk should be 10 of the 20 confirmed fraud rows.

What you just built

A trained model that converts four cheap features into a calibrated risk score. To productionize:

Schedule CREATE EXPERIMENT weekly on a rolling 90-day labelled window.
Wire AUTOML.PREDICT into your authorization microservice — single SQL call, sub-millisecond on small models.
Set a risk > 0.7 → manual review rule; let the model handle the ambiguous middle.

— that's the whole stack.

Get SynapCores Community Edition →

Credit Card Fraud Detection (AutoML)