Insurance Claim Fraud Auto-Flag (AutoML)
Tested against SynapCores CE v1.7.0.1-ce (the currently-shipped release on Docker Hub:
synapcores/community:v1.7.0.1-ce).
Objective
Auto-flag suspect claims at intake. The model learns the joint shape of claim amount, days between policy start and claim, prior claim count, and document completeness — the four signals every Special Investigations Unit (SIU) actually uses.
Why this matters: the Coalition Against Insurance Fraud puts US carrier
losses at $308B/year. Carriers run hand-coded red-flag rules; new fraud
rings rotate around them in weeks. A trained classifier updates with one
CREATE EXPERIMENT call.
Step 1 — Schema + labelled claims
160 claims: 144 paid-as-normal + 16 SIU-confirmed fraud.
DROP TABLE IF EXISTS ins_claim;
CREATE TABLE ins_claim (
id INTEGER PRIMARY KEY,
claim_amt DOUBLE,
days_after_policy INTEGER,
prior_claims INTEGER,
doc_count INTEGER,
is_suspect INTEGER
);
INSERT INTO ins_claim VALUES
(3,718.55,1394,1,9,0),
(11,1603.67,745,1,4,0),
(25,329.66,830,1,4,0),
(5,1573.64,1291,0,7,0),
(153,26544.45,22,5,2,1),
(126,2478.28,905,2,5,0),
(100,1503.96,550,2,8,0),
(105,3935.43,1662,0,4,0),
(48,1844.45,1596,2,8,0),
(51,878.51,1382,1,3,0),
(107,2663.33,1292,0,7,0),
(143,3349.76,812,1,3,0),
(90,3645.96,1093,1,3,0),
(7,2681.72,103,2,5,0),
(70,1580.48,1629,0,3,0),
(117,4481.81,747,0,6,0),
(116,1321.97,78,0,6,0),
(2,2836.27,975,0,8,0),
(137,4497.74,284,0,6,0),
(148,52938.38,17,7,0,1),
(80,1482.88,108,1,3,0),
(95,1097.34,1468,0,3,0),
(19,2674.79,1209,2,7,0),
(58,3222.47,1305,2,3,0),
(43,3435.89,1249,1,7,0),
(158,21981.9,10,9,1,1),
(60,2983.31,631,2,5,0),
(113,667.48,1712,2,7,0),
(74,1503.5,700,1,7,0),
(33,4462.52,1583,1,6,0),
(136,672.15,1211,0,5,0),
(109,3670.88,312,2,8,0),
(138,3712.27,706,1,9,0),
(27,2107.41,188,2,4,0),
(88,2112.43,586,0,9,0),
(26,1611.64,791,2,4,0),
(112,2801.21,690,2,9,0),
(97,3546.62,504,2,6,0),
(57,1671.85,1734,2,3,0),
(52,328.91,384,2,5,0),
(39,3209.76,423,0,3,0),
(79,1225.45,177,0,5,0),
(98,3602.55,607,2,4,0),
(92,4132.39,1253,1,6,0),
(81,643.63,971,1,4,0),
(20,2063.93,1438,2,6,0),
(69,2598.05,1285,0,7,0),
(120,2248.5,419,2,3,0),
(111,4351.07,342,0,7,0),
(104,3635.99,930,1,9,0),
(140,283.41,292,0,5,0),
(55,2917.56,987,2,4,0),
(115,639.29,1094,2,8,0),
(160,30063.86,5,6,1,1),
(152,31019.81,12,7,0,1),
(128,1194.25,1704,1,5,0),
(63,3195.65,1733,0,7,0),
(130,1375.76,1545,1,6,0),
(127,4295.19,1630,2,3,0),
(149,19432.71,5,9,1,1),
(73,486.42,425,1,7,0),
(145,42054.6,5,4,1,1),
(86,2534.3,250,0,9,0),
(56,2105.13,940,0,5,0),
(110,1179.35,911,1,4,0),
(72,2919.03,193,0,7,0),
(151,49188.61,8,4,2,1),
(21,578.87,968,1,3,0),
(99,288.28,390,2,9,0),
(91,3101.45,482,1,4,0),
(23,4294.27,1108,0,6,0),
(71,4278.82,1465,0,9,0),
(53,381.7,1056,2,3,0),
(37,1706.07,191,1,3,0),
(134,2024.77,798,2,3,0),
(139,981.78,545,0,3,0),
(124,972.68,1365,0,7,0),
(10,4427.08,1219,0,8,0),
(135,3876.04,856,0,5,0),
(84,1209.58,525,2,5,0),
(44,1093.17,764,0,8,0),
(85,4085.32,1010,1,7,0),
(16,2377.0,1119,1,6,0),
(101,1053.06,1683,1,8,0),
(142,1404.73,1304,2,5,0),
(122,2110.67,983,1,8,0),
(96,695.97,1787,1,9,0),
(93,2911.87,159,1,8,0),
(8,1535.93,449,0,9,0),
(38,936.93,1477,2,6,0),
(102,1911.71,1200,2,4,0),
(159,49657.86,8,5,2,1),
(123,3572.22,719,0,9,0),
(32,3242.83,337,2,5,0),
(28,3622.4,1385,2,4,0),
(154,45441.4,14,4,1,1),
(144,2599.56,514,0,4,0),
(78,4296.09,1450,1,3,0),
(64,1019.14,890,1,9,0),
(141,2013.87,1383,1,8,0),
(47,1082.29,1603,1,8,0),
(157,38231.43,23,6,2,1),
(118,4245.07,1100,2,4,0),
(66,3019.81,459,2,4,0),
(54,1391.42,1633,0,4,0),
(46,1222.11,1782,2,3,0),
(14,3175.12,944,0,5,0),
(50,683.99,193,2,7,0),
(49,2003.34,579,0,4,0),
(67,568.44,837,1,4,0),
(103,3536.24,1427,1,7,0),
(40,4358.68,781,2,3,0),
(131,2398.73,121,2,5,0),
(4,4025.07,534,1,3,0),
(121,3188.8,1429,1,8,0),
(75,2876.38,1575,2,5,0),
(9,679.74,1147,2,9,0),
(65,3964.58,218,1,8,0),
(147,64260.96,30,7,2,1),
(89,856.91,1205,1,5,0),
(87,1227.72,695,2,4,0),
(42,3924.64,429,0,5,0),
(77,4076.15,1700,2,8,0),
(36,4253.73,97,0,7,0),
(17,3042.3,438,1,7,0),
(146,60999.18,12,9,0,1),
(30,1966.21,1755,0,7,0),
(94,2173.43,1433,1,5,0),
(129,381.3,763,2,7,0),
(61,1009.89,628,2,4,0),
(114,3270.14,537,1,4,0),
(15,2486.81,126,1,5,0),
(22,2056.57,884,0,9,0),
(41,4470.47,1154,1,6,0),
(82,887.21,642,1,6,0),
(12,4313.88,699,2,7,0),
(68,433.58,1134,0,4,0),
(83,1722.42,1725,1,5,0),
(45,4105.13,1134,0,3,0),
(62,1695.99,1686,2,6,0),
(132,2513.56,383,1,9,0),
(35,4305.42,286,2,5,0),
(133,4154.32,985,2,6,0),
(106,2966.7,613,2,5,0),
(119,3236.08,595,1,3,0),
(155,25759.45,5,4,1,1),
(6,3146.62,940,2,3,0),
(1,4320.3,347,0,3,0),
(24,229.06,999,0,4,0),
(108,2581.83,786,1,9,0),
(31,2180.48,1793,0,6,0),
(29,3018.98,266,0,4,0),
(76,1585.54,1444,2,3,0),
(156,73299.69,30,9,2,1),
(125,729.53,203,0,9,0),
(150,20532.13,1,7,0,1),
(13,811.56,1750,1,8,0),
(59,954.58,1703,2,7,0),
(18,1807.25,1253,0,5,0),
(34,3327.19,1377,2,5,0)
;
SELECT COUNT(*) AS total, SUM(is_suspect) AS confirmed_fraud FROM ins_claim;
-- → 160 rows, 16 confirmed fraud
Step 2 — Train + deploy
CREATE EXPERIMENT ins_clf AS
SELECT claim_amt, days_after_policy, prior_claims, doc_count, is_suspect AS target
FROM ins_claim
WITH (
task_type = 'binary_classification',
target_column = 'target',
optimization_metric = 'auc',
max_trials = 8,
time_budget_seconds = 120,
algorithms = ['logistic_regression', 'random_forest', 'gradient_boosting'],
validation_strategy = 'kfold',
n_folds = 3,
feature_engineering = false,
hyperparameter_strategy = 'random'
);
DEPLOY MODEL ins_flagger FROM EXPERIMENT ins_clf;
-- best_score = 1.0
Step 3 — Score new claims
-- A modest claim 540 days into a clean policy with full documentation
SELECT AUTOML.PREDICT('ins_flagger', 1200.0, 540, 1, 6) AS risk;
-- → 0.056 (PAY)
-- $42K claim 8 days after policy start, 6 prior claims, 1 supporting doc
SELECT AUTOML.PREDICT('ins_flagger', 42000.0, 8, 6, 1) AS risk;
-- → 0.944 (SIU REVIEW)
Step 4 — Sweep the queue
SELECT id, claim_amt, days_after_policy, prior_claims, doc_count,
AUTOML.PREDICT('ins_flagger', claim_amt, days_after_policy, prior_claims, doc_count) AS risk
FROM ins_claim
ORDER BY risk DESC
LIMIT 10;
Productionizing
Wire risk > 0.7 → SIU queue, retrain monthly on rolling 12-month
labelled window. Pair with Recipe 9 (narrative semantic fraud) for
two-signal detection — structured features + free-text narrative drift.