Threat-intel correlation across feeds

Objective

Threat actors get different names from each vendor — CrowdStrike's "Scattered Spider" is Mandiant's "UNC3944" is Microsoft's "Octo Tempest." Indicators of compromise (IOCs) drift the same way: a hash on VirusTotal shows up as a path on AlienVault as a domain on Shodan. Cluster all of them in one graph by extracting actor/IOC mentions from advisory text, then linking semantically near records with SIMILAR_TO. The wow moment: one Cypher query maps every alias of a single threat group across three feeds plus their shared infrastructure.

Step 1: Extract from a vendor advisory

curl -X POST https://localhost:8443/v2/graph/extract \
  -H "Authorization: Bearer $AIDB_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "On 2026-04-22 CrowdStrike attributed the Acme Bank breach to Scattered Spider, also tracked as Octo Tempest. The actor used the domain login-acme-bank[.]net for credential phishing, dropped a payload with SHA256 a3f1d4e8c920b7e8c6f1d4e8a3f1d4e8c920b7e8c6f1d4e8a3f1d4e8c920b7e8, and exfiltrated to 185.220.101.45. Mandiant separately reported UNC3944 using identical TTPs against three other financial institutions in the same week.",
    "default_node_label": "ThreatEntity",
    "node_provenance": {"feed": "crowdstrike-advisory-2026-04-22"},
    "edge_provenance": {"feed": "crowdstrike-advisory-2026-04-22", "tlp": "amber"},
    "min_confidence": 0.6
  }'

Step 2: Pre-seed for offline reproducibility

For demos that don't depend on a live LLM, the same content as Cypher MERGEs (with embedding properties so SIMILAR_TO works):

// Actor aliases from three feeds — semantically similar profiles.
MERGE (a1:Actor {name: "Scattered Spider", feed: "CrowdStrike",
       profile: "Financially motivated, credential phishing, social engineering, ransomware",
       embedding: [0.66, 0.21, -0.04, 0.31, 0.14]})
MERGE (a2:Actor {name: "Octo Tempest",     feed: "Microsoft",
       profile: "Financial gain, social engineering of help desks, vishing, ransomware",
       embedding: [0.65, 0.22, -0.05, 0.30, 0.15]})
MERGE (a3:Actor {name: "UNC3944",          feed: "Mandiant",
       profile: "Financially motivated cluster, social engineering and SIM swap into IT helpdesks",
       embedding: [0.64, 0.23, -0.03, 0.32, 0.13]})
MERGE (a4:Actor {name: "APT41",            feed: "Mandiant",
       profile: "China-nexus, espionage and financially motivated supply-chain attacks",
       embedding: [-0.21, 0.58, 0.10, -0.04, 0.41]})

// IOCs observed
MERGE (i1:IOC {kind: "domain", value: "login-acme-bank.net",
               first_seen: "2026-04-22", embedding: [0.10, 0.31, 0.07, 0.25, -0.04]})
MERGE (i2:IOC {kind: "ip",     value: "185.220.101.45",
               first_seen: "2026-04-23", embedding: [0.12, 0.30, 0.08, 0.26, -0.03]})
MERGE (i3:IOC {kind: "sha256", value: "a3f1d4e8c920b7e8c6f1d4e8a3f1d4e8c920b7e8c6f1d4e8a3f1d4e8c920b7e8",
               first_seen: "2026-04-22", embedding: [0.11, 0.32, 0.06, 0.24, -0.05]})
MERGE (i4:IOC {kind: "domain", value: "secure-acmebank-login.com",
               first_seen: "2026-04-25", embedding: [0.10, 0.30, 0.07, 0.26, -0.04]})

// Victim
MERGE (v1:Victim {name: "Acme Bank", sector: "finance"})
MERGE (v2:Victim {name: "Helia Federal Credit Union", sector: "finance"})

// Edges
MERGE (a1)-[:USES]->(i1)
MERGE (a1)-[:USES]->(i2)
MERGE (a1)-[:USES]->(i3)
MERGE (a3)-[:USES]->(i4)
MERGE (a1)-[:TARGETED]->(v1)
MERGE (a3)-[:TARGETED]->(v2);

Step 3: Cluster aliases of the same actor across feeds

// Find actor names that are likely the same group, ranked by similarity.
MATCH (a:Actor)-[:SIMILAR_TO > 0.92]->(b:Actor)
WHERE id(a) < id(b) AND a.feed <> b.feed
RETURN a.name AS alias_a,
       a.feed AS feed_a,
       b.name AS alias_b,
       b.feed AS feed_b;

Step 4: Walk to shared infrastructure

// "Show every IOC used by any alias of Scattered Spider, regardless of which feed reported the alias."
MATCH (seed:Actor {name: "Scattered Spider"})-[:SIMILAR_TO > 0.9]->(alias:Actor)
MATCH (alias)-[:USES]->(ioc:IOC)
RETURN DISTINCT alias.name AS alias,
                alias.feed AS source,
                ioc.kind   AS indicator_kind,
                ioc.value  AS indicator;

What's happening

The /v2/graph/extract call turns advisory prose into actor → IOC → victim edges in one request. Re-running on duplicate advisories is idempotent (extractor reports duplicates as discarded).
Actor profiles from different feeds are written in different prose, but their embeddings cluster tightly — SIMILAR_TO > 0.92 reliably collapses aliases without false-positive merges.
Once aliases are linked, normal Cypher patterns walk through :USES and :TARGETED edges to return cross-feed infrastructure overlap and victim history.
This is exactly the workflow a SOC analyst does manually today: read three advisories, mentally match the aliases, look up the IOC overlap. Done as one query, with provenance on every edge.
tlp: amber on the edge provenance lets you filter sharing-restricted intel out of automated pushes.

Try this next

// "What victims has any alias of this actor hit?"
MATCH (seed:Actor {name: "Octo Tempest"})-[:SIMILAR_TO > 0.9]->(alias:Actor)-[:TARGETED]->(v:Victim)
RETURN DISTINCT v.name AS victim, v.sector AS sector;

// IOC overlap between two named actors regardless of feed.
MATCH (a1:Actor {name: "Scattered Spider"})-[:USES]->(i:IOC)<-[:USES]-(a2:Actor)
WHERE a2.name <> a1.name
RETURN i.kind, i.value, a1.name AS actor_a, a2.name AS actor_b;

// Outlier IOCs nobody has clustered yet.
MATCH (i:IOC)
OPTIONAL MATCH (a:Actor)-[:USES]->(i)
WITH i, count(a) AS attributions
WHERE attributions = 0
RETURN i.kind, i.value, i.first_seen
ORDER BY i.first_seen DESC;

Threat-intel correlation across feeds

Threat-intel correlation across feeds

Objective

Step 1: Extract from a vendor advisory

Step 2: Pre-seed for offline reproducibility

Step 3: Cluster aliases of the same actor across feeds

Step 4: Walk to shared infrastructure

What's happening

Try this next

Run this on your own machine