Insurance Narrative Fraud Detection (semantic)

Use EMBED + COSINE_SIMILARITY to catch claim narratives that semantically match known fraud patterns — the unstructured-data side of SIU fraud detection.

All recipes· anomaly-detection· 4 minutesintermediatesql
Instance: localhost:8080

Opens your running SynapCores (Insurance Narrative Fraud Detection (semantic) will be staged for a preview — nothing runs until you click Run). No instance yet? Install free in ~30s.

Share

Insurance Narrative Fraud Detection (semantic)

Tested against SynapCores CE v1.7.0.1-ce (the currently-shipped release on Docker Hub: synapcores/community:v1.7.0.1-ce).

Objective

Score every incoming claim's free-text narrative against the centroid of historical confirmed-fraud narratives. The detection catches the shape of the story — "single vehicle, late night, no witnesses, total loss, demanding immediate payout" — that no structured-feature model would notice.

Why this matters: SIU investigators say 70%+ of the fraud signal lives in the narrative ("how the customer tells the story") — exactly the data carriers don't have systematic tooling for. EMBED + cosine turns "adjuster intuition" into a SQL function.

Step 1 — Schema + labelled narratives

5 confirmed-fraud narratives (the centroid) + 8 routine claims.

DROP TABLE IF EXISTS claim_narr;
CREATE TABLE claim_narr (
    id              INTEGER PRIMARY KEY,
    narrative       TEXT,
    is_known_fraud  INTEGER
);

INSERT INTO claim_narr VALUES
(1,'Total loss vehicle fire occurred late at night in remote area no witnesses claim full value',1),
(2,'Single vehicle accident no witnesses car burned completely keys missing demanding total payout',1),
(3,'Late night fire destroyed vehicle no skid marks no police report claim full coverage immediately',1),
(4,'Vehicle stolen and burned no witnesses remote location request expedited total loss settlement',1),
(5,'Mysterious total loss fire on isolated road policy purchased two weeks ago no documentation',1),
(6,'Rear-ended at red light by distracted driver other party admitted fault police on scene photos attached',0),
(7,'Minor bumper damage in parking lot witness statement attached opposing driver insured with Geico',0),
(8,'Hailstorm damaged hood and windshield insurance adjuster scheduled inspection for next Tuesday',0),
(9,'Tree branch fell on parked vehicle during storm photos and arborist report attached for review',0),
(10,'Side-swiped on highway by truck changing lanes incident report filed dashcam footage available',0),
(11,'Minor fender bender at intersection both parties exchanged info police report number attached',0),
(12,'Animal collision with deer on rural highway photos of damage taken vehicle drivable to body shop',0),
(13,'Vandalism keyed paint on driver side door police report filed neighborhood security camera footage requested',0);

Step 2 — Sanity-check the embedding signal

-- A new narrative that "smells like" the fraud cluster
SELECT COSINE_SIMILARITY(
    EMBED('Late night fire total loss vehicle no witnesses immediate full payout'),
    EMBED((SELECT narrative FROM claim_narr WHERE id=1))
) AS sim;
-- → 0.823    (high — fraud-like)

-- A new narrative that doesn't
SELECT COSINE_SIMILARITY(
    EMBED('Rear-ended at stop light by uninsured driver police report attached'),
    EMBED((SELECT narrative FROM claim_narr WHERE id=1))
) AS sim;
-- → 0.219    (low — looks routine)

Step 3 — Build the fraud centroid (concatenated)

-- Concatenate all known-fraud narratives, embed once: a cheap centroid
SELECT EMBED((SELECT GROUP_CONCAT(narrative, ' ') FROM claim_narr WHERE is_known_fraud=1));
-- (returns the 384-dim vector you'll compare every new claim against)

Step 4 — Score live claims

-- A new claim about a multi-car merge collision: not fraud-shaped
SELECT COSINE_SIMILARITY(
    EMBED('Side collision with delivery truck merging lanes both drivers exchanged information'),
    EMBED((SELECT GROUP_CONCAT(narrative, ' ') FROM claim_narr WHERE is_known_fraud=1))
) AS fraud_sim;
-- → 0.304    (CLEAR — pay)

-- A new claim with classic SIU red flags: highly fraud-shaped
SELECT COSINE_SIMILARITY(
    EMBED('Vehicle destroyed by fire on lonely road no witnesses request immediate full payout'),
    EMBED((SELECT GROUP_CONCAT(narrative, ' ') FROM claim_narr WHERE is_known_fraud=1))
) AS fraud_sim;
-- → 0.663    (SIU REVIEW)

Productionizing

Wire fraud_sim > 0.5 → SIU queue at intake. Refresh the centroid weekly from the rolling 24-month confirmed-fraud cohort. Pair with Recipe 3 (structured AutoML fraud) for a two-tower model: structured features + narrative drift each contribute a signal, ensemble them.


Get SynapCores Community Edition →

Tags

anomaly-detectionembeddingsvectorinsurancefraudsemantic

Run this on your own machine

Install SynapCores Community Edition free, paste the SQL or Cypher above into the bundled web UI, and watch it run.

Download Free CE