Insurance Claim Fraud Auto-Flag (AutoML)

Tested against SynapCores CE v1.7.0.1-ce (the currently-shipped release on Docker Hub: synapcores/community:v1.7.0.1-ce).

Objective

Auto-flag suspect claims at intake. The model learns the joint shape of claim amount, days between policy start and claim, prior claim count, and document completeness — the four signals every Special Investigations Unit (SIU) actually uses.

Why this matters: the Coalition Against Insurance Fraud puts US carrier losses at $308B/year. Carriers run hand-coded red-flag rules; new fraud rings rotate around them in weeks. A trained classifier updates with one CREATE EXPERIMENT call.

Step 1 — Schema + labelled claims

160 claims: 144 paid-as-normal + 16 SIU-confirmed fraud.

DROP TABLE IF EXISTS ins_claim;
CREATE TABLE ins_claim (
    id                  INTEGER PRIMARY KEY,
    claim_amt           DOUBLE,
    days_after_policy   INTEGER,
    prior_claims        INTEGER,
    doc_count           INTEGER,
    is_suspect          INTEGER
);

INSERT INTO ins_claim VALUES
(3,718.55,1394,1,9,0),
(11,1603.67,745,1,4,0),
(25,329.66,830,1,4,0),
(5,1573.64,1291,0,7,0),
(153,26544.45,22,5,2,1),
(126,2478.28,905,2,5,0),
(100,1503.96,550,2,8,0),
(105,3935.43,1662,0,4,0),
(48,1844.45,1596,2,8,0),
(51,878.51,1382,1,3,0),
(107,2663.33,1292,0,7,0),
(143,3349.76,812,1,3,0),
(90,3645.96,1093,1,3,0),
(7,2681.72,103,2,5,0),
(70,1580.48,1629,0,3,0),
(117,4481.81,747,0,6,0),
(116,1321.97,78,0,6,0),
(2,2836.27,975,0,8,0),
(137,4497.74,284,0,6,0),
(148,52938.38,17,7,0,1),
(80,1482.88,108,1,3,0),
(95,1097.34,1468,0,3,0),
(19,2674.79,1209,2,7,0),
(58,3222.47,1305,2,3,0),
(43,3435.89,1249,1,7,0),
(158,21981.9,10,9,1,1),
(60,2983.31,631,2,5,0),
(113,667.48,1712,2,7,0),
(74,1503.5,700,1,7,0),
(33,4462.52,1583,1,6,0),
(136,672.15,1211,0,5,0),
(109,3670.88,312,2,8,0),
(138,3712.27,706,1,9,0),
(27,2107.41,188,2,4,0),
(88,2112.43,586,0,9,0),
(26,1611.64,791,2,4,0),
(112,2801.21,690,2,9,0),
(97,3546.62,504,2,6,0),
(57,1671.85,1734,2,3,0),
(52,328.91,384,2,5,0),
(39,3209.76,423,0,3,0),
(79,1225.45,177,0,5,0),
(98,3602.55,607,2,4,0),
(92,4132.39,1253,1,6,0),
(81,643.63,971,1,4,0),
(20,2063.93,1438,2,6,0),
(69,2598.05,1285,0,7,0),
(120,2248.5,419,2,3,0),
(111,4351.07,342,0,7,0),
(104,3635.99,930,1,9,0),
(140,283.41,292,0,5,0),
(55,2917.56,987,2,4,0),
(115,639.29,1094,2,8,0),
(160,30063.86,5,6,1,1),
(152,31019.81,12,7,0,1),
(128,1194.25,1704,1,5,0),
(63,3195.65,1733,0,7,0),
(130,1375.76,1545,1,6,0),
(127,4295.19,1630,2,3,0),
(149,19432.71,5,9,1,1),
(73,486.42,425,1,7,0),
(145,42054.6,5,4,1,1),
(86,2534.3,250,0,9,0),
(56,2105.13,940,0,5,0),
(110,1179.35,911,1,4,0),
(72,2919.03,193,0,7,0),
(151,49188.61,8,4,2,1),
(21,578.87,968,1,3,0),
(99,288.28,390,2,9,0),
(91,3101.45,482,1,4,0),
(23,4294.27,1108,0,6,0),
(71,4278.82,1465,0,9,0),
(53,381.7,1056,2,3,0),
(37,1706.07,191,1,3,0),
(134,2024.77,798,2,3,0),
(139,981.78,545,0,3,0),
(124,972.68,1365,0,7,0),
(10,4427.08,1219,0,8,0),
(135,3876.04,856,0,5,0),
(84,1209.58,525,2,5,0),
(44,1093.17,764,0,8,0),
(85,4085.32,1010,1,7,0),
(16,2377.0,1119,1,6,0),
(101,1053.06,1683,1,8,0),
(142,1404.73,1304,2,5,0),
(122,2110.67,983,1,8,0),
(96,695.97,1787,1,9,0),
(93,2911.87,159,1,8,0),
(8,1535.93,449,0,9,0),
(38,936.93,1477,2,6,0),
(102,1911.71,1200,2,4,0),
(159,49657.86,8,5,2,1),
(123,3572.22,719,0,9,0),
(32,3242.83,337,2,5,0),
(28,3622.4,1385,2,4,0),
(154,45441.4,14,4,1,1),
(144,2599.56,514,0,4,0),
(78,4296.09,1450,1,3,0),
(64,1019.14,890,1,9,0),
(141,2013.87,1383,1,8,0),
(47,1082.29,1603,1,8,0),
(157,38231.43,23,6,2,1),
(118,4245.07,1100,2,4,0),
(66,3019.81,459,2,4,0),
(54,1391.42,1633,0,4,0),
(46,1222.11,1782,2,3,0),
(14,3175.12,944,0,5,0),
(50,683.99,193,2,7,0),
(49,2003.34,579,0,4,0),
(67,568.44,837,1,4,0),
(103,3536.24,1427,1,7,0),
(40,4358.68,781,2,3,0),
(131,2398.73,121,2,5,0),
(4,4025.07,534,1,3,0),
(121,3188.8,1429,1,8,0),
(75,2876.38,1575,2,5,0),
(9,679.74,1147,2,9,0),
(65,3964.58,218,1,8,0),
(147,64260.96,30,7,2,1),
(89,856.91,1205,1,5,0),
(87,1227.72,695,2,4,0),
(42,3924.64,429,0,5,0),
(77,4076.15,1700,2,8,0),
(36,4253.73,97,0,7,0),
(17,3042.3,438,1,7,0),
(146,60999.18,12,9,0,1),
(30,1966.21,1755,0,7,0),
(94,2173.43,1433,1,5,0),
(129,381.3,763,2,7,0),
(61,1009.89,628,2,4,0),
(114,3270.14,537,1,4,0),
(15,2486.81,126,1,5,0),
(22,2056.57,884,0,9,0),
(41,4470.47,1154,1,6,0),
(82,887.21,642,1,6,0),
(12,4313.88,699,2,7,0),
(68,433.58,1134,0,4,0),
(83,1722.42,1725,1,5,0),
(45,4105.13,1134,0,3,0),
(62,1695.99,1686,2,6,0),
(132,2513.56,383,1,9,0),
(35,4305.42,286,2,5,0),
(133,4154.32,985,2,6,0),
(106,2966.7,613,2,5,0),
(119,3236.08,595,1,3,0),
(155,25759.45,5,4,1,1),
(6,3146.62,940,2,3,0),
(1,4320.3,347,0,3,0),
(24,229.06,999,0,4,0),
(108,2581.83,786,1,9,0),
(31,2180.48,1793,0,6,0),
(29,3018.98,266,0,4,0),
(76,1585.54,1444,2,3,0),
(156,73299.69,30,9,2,1),
(125,729.53,203,0,9,0),
(150,20532.13,1,7,0,1),
(13,811.56,1750,1,8,0),
(59,954.58,1703,2,7,0),
(18,1807.25,1253,0,5,0),
(34,3327.19,1377,2,5,0)
;

SELECT COUNT(*) AS total, SUM(is_suspect) AS confirmed_fraud FROM ins_claim;
-- → 160 rows, 16 confirmed fraud

Step 2 — Train + deploy

CREATE EXPERIMENT ins_clf AS
SELECT claim_amt, days_after_policy, prior_claims, doc_count, is_suspect AS target
FROM ins_claim
WITH (
    task_type = 'binary_classification',
    target_column = 'target',
    optimization_metric = 'auc',
    max_trials = 8,
    time_budget_seconds = 120,
    algorithms = ['logistic_regression', 'random_forest', 'gradient_boosting'],
    validation_strategy = 'kfold',
    n_folds = 3,
    feature_engineering = false,
    hyperparameter_strategy = 'random'
);

DEPLOY MODEL ins_flagger FROM EXPERIMENT ins_clf;
-- best_score = 1.0

Step 3 — Score new claims

-- A modest claim 540 days into a clean policy with full documentation
SELECT AUTOML.PREDICT('ins_flagger', 1200.0, 540, 1, 6) AS risk;
-- → 0.056  (PAY)

-- $42K claim 8 days after policy start, 6 prior claims, 1 supporting doc
SELECT AUTOML.PREDICT('ins_flagger', 42000.0, 8, 6, 1) AS risk;
-- → 0.944  (SIU REVIEW)

Step 4 — Sweep the queue

-- ORDER BY on PREDICT()-derived alias needs CTE on v1.7.0.2.1-ce (#232).
WITH scored AS (
    SELECT id, claim_amt, days_after_policy, prior_claims, doc_count,
           AUTOML.PREDICT('ins_flagger', claim_amt, days_after_policy, prior_claims, doc_count) AS risk
    FROM ins_claim
)
SELECT * FROM scored
ORDER BY risk DESC
LIMIT 10;

Productionizing

Wire risk > 0.7 → SIU queue, retrain monthly on rolling 12-month labelled window. Pair with Recipe 9 (narrative semantic fraud) for two-signal detection — structured features + free-text narrative drift.

Get SynapCores Community Edition →

Insurance Claim Fraud Auto-Flag (AutoML)