Threat-intel correlation across feeds
Objective
Threat actors get different names from each vendor — CrowdStrike's "Scattered Spider" is
Mandiant's "UNC3944" is Microsoft's "Octo Tempest." Indicators of compromise (IOCs) drift the
same way: a hash on VirusTotal shows up as a path on AlienVault as a domain on Shodan. Cluster
all of them in one graph by extracting actor/IOC mentions from advisory text, then linking
semantically near records with SIMILAR_TO. The wow moment: one Cypher query maps every alias
of a single threat group across three feeds plus their shared infrastructure.
Step 1: Extract from a vendor advisory
curl -X POST https://localhost:8443/v2/graph/extract \
-H "Authorization: Bearer $AIDB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "On 2026-04-22 CrowdStrike attributed the Acme Bank breach to Scattered Spider, also tracked as Octo Tempest. The actor used the domain login-acme-bank[.]net for credential phishing, dropped a payload with SHA256 a3f1d4e8c920b7e8c6f1d4e8a3f1d4e8c920b7e8c6f1d4e8a3f1d4e8c920b7e8, and exfiltrated to 185.220.101.45. Mandiant separately reported UNC3944 using identical TTPs against three other financial institutions in the same week.",
"default_node_label": "ThreatEntity",
"node_provenance": {"feed": "crowdstrike-advisory-2026-04-22"},
"edge_provenance": {"feed": "crowdstrike-advisory-2026-04-22", "tlp": "amber"},
"min_confidence": 0.6
}'
Step 2: Pre-seed for offline reproducibility
For demos that don't depend on a live LLM, the same content as Cypher MERGEs (with embedding properties so SIMILAR_TO works):
// Actor aliases from three feeds — semantically similar profiles.
MERGE (a1:Actor {name: "Scattered Spider", feed: "CrowdStrike",
profile: "Financially motivated, credential phishing, social engineering, ransomware",
embedding: [0.66, 0.21, -0.04, 0.31, 0.14]})
MERGE (a2:Actor {name: "Octo Tempest", feed: "Microsoft",
profile: "Financial gain, social engineering of help desks, vishing, ransomware",
embedding: [0.65, 0.22, -0.05, 0.30, 0.15]})
MERGE (a3:Actor {name: "UNC3944", feed: "Mandiant",
profile: "Financially motivated cluster, social engineering and SIM swap into IT helpdesks",
embedding: [0.64, 0.23, -0.03, 0.32, 0.13]})
MERGE (a4:Actor {name: "APT41", feed: "Mandiant",
profile: "China-nexus, espionage and financially motivated supply-chain attacks",
embedding: [-0.21, 0.58, 0.10, -0.04, 0.41]})
// IOCs observed
MERGE (i1:IOC {kind: "domain", value: "login-acme-bank.net",
first_seen: "2026-04-22", embedding: [0.10, 0.31, 0.07, 0.25, -0.04]})
MERGE (i2:IOC {kind: "ip", value: "185.220.101.45",
first_seen: "2026-04-23", embedding: [0.12, 0.30, 0.08, 0.26, -0.03]})
MERGE (i3:IOC {kind: "sha256", value: "a3f1d4e8c920b7e8c6f1d4e8a3f1d4e8c920b7e8c6f1d4e8a3f1d4e8c920b7e8",
first_seen: "2026-04-22", embedding: [0.11, 0.32, 0.06, 0.24, -0.05]})
MERGE (i4:IOC {kind: "domain", value: "secure-acmebank-login.com",
first_seen: "2026-04-25", embedding: [0.10, 0.30, 0.07, 0.26, -0.04]})
// Victim
MERGE (v1:Victim {name: "Acme Bank", sector: "finance"})
MERGE (v2:Victim {name: "Helia Federal Credit Union", sector: "finance"})
// Edges
MERGE (a1)-[:USES]->(i1)
MERGE (a1)-[:USES]->(i2)
MERGE (a1)-[:USES]->(i3)
MERGE (a3)-[:USES]->(i4)
MERGE (a1)-[:TARGETED]->(v1)
MERGE (a3)-[:TARGETED]->(v2);
Step 3: Cluster aliases of the same actor across feeds
// Find actor names that are likely the same group, ranked by similarity.
MATCH (a:Actor)-[:SIMILAR_TO > 0.92]->(b:Actor)
WHERE id(a) < id(b) AND a.feed <> b.feed
RETURN a.name AS alias_a,
a.feed AS feed_a,
b.name AS alias_b,
b.feed AS feed_b;
Step 4: Walk to shared infrastructure
// "Show every IOC used by any alias of Scattered Spider, regardless of which feed reported the alias."
MATCH (seed:Actor {name: "Scattered Spider"})-[:SIMILAR_TO > 0.9]->(alias:Actor)
MATCH (alias)-[:USES]->(ioc:IOC)
RETURN DISTINCT alias.name AS alias,
alias.feed AS source,
ioc.kind AS indicator_kind,
ioc.value AS indicator;
What's happening
- The
/v2/graph/extractcall turns advisory prose into actor → IOC → victim edges in one request. Re-running on duplicate advisories is idempotent (extractor reports duplicates asdiscarded). - Actor profiles from different feeds are written in different prose, but their embeddings cluster
tightly —
SIMILAR_TO > 0.92reliably collapses aliases without false-positive merges. - Once aliases are linked, normal Cypher patterns walk through
:USESand:TARGETEDedges to return cross-feed infrastructure overlap and victim history. - This is exactly the workflow a SOC analyst does manually today: read three advisories, mentally match the aliases, look up the IOC overlap. Done as one query, with provenance on every edge.
tlp: amberon the edge provenance lets you filter sharing-restricted intel out of automated pushes.
Try this next
// "What victims has any alias of this actor hit?"
MATCH (seed:Actor {name: "Octo Tempest"})-[:SIMILAR_TO > 0.9]->(alias:Actor)-[:TARGETED]->(v:Victim)
RETURN DISTINCT v.name AS victim, v.sector AS sector;
// IOC overlap between two named actors regardless of feed.
MATCH (a1:Actor {name: "Scattered Spider"})-[:USES]->(i:IOC)<-[:USES]-(a2:Actor)
WHERE a2.name <> a1.name
RETURN i.kind, i.value, a1.name AS actor_a, a2.name AS actor_b;
// Outlier IOCs nobody has clustered yet.
MATCH (i:IOC)
OPTIONAL MATCH (a:Actor)-[:USES]->(i)
WITH i, count(a) AS attributions
WHERE attributions = 0
RETURN i.kind, i.value, i.first_seen
ORDER BY i.first_seen DESC;