Image Similarity Matching

Find similar images using vector cosine similarity for visual content discovery and deduplication

All recipes· image-management· 15 minutesintermediate

Image Similarity Matching

Objective

Use vector embeddings and cosine similarity to find visually similar images. This enables content discovery, duplicate detection, and recommendation systems.

Step 1: Create Image Catalog Table

Create a table for storing images with embeddings.

CREATE TABLE image_catalog (
    id INTEGER PRIMARY KEY,
    image_id VARCHAR(50) UNIQUE NOT NULL,
    filename VARCHAR(255) NOT NULL,
    image IMAGE(JPEG),
    description TEXT,
    category VARCHAR(50),
    embedding VECTOR(384),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Step 2: Insert Sample Images

Add sample images with descriptions.

INSERT INTO image_catalog (id, image_id, filename, description, category) VALUES
    (1, 'IMG001', 'beach_sunset_1.jpg', 'Beautiful sunset over ocean beach with orange and pink sky', 'nature'),
    (2, 'IMG002', 'beach_sunset_2.jpg', 'Tropical beach at sunset with palm trees silhouette', 'nature'),
    (3, 'IMG003', 'mountain_snow.jpg', 'Snow covered mountain peaks under blue sky', 'nature'),
    (4, 'IMG004', 'city_night_1.jpg', 'City skyline at night with lights reflecting on water', 'urban'),
    (5, 'IMG005', 'city_night_2.jpg', 'Downtown buildings illuminated at night', 'urban'),
    (6, 'IMG006', 'forest_path.jpg', 'Winding path through autumn forest with fallen leaves', 'nature'),
    (7, 'IMG007', 'dog_portrait.jpg', 'Golden retriever dog portrait in natural light', 'animals'),
    (8, 'IMG008', 'cat_sleeping.jpg', 'Orange tabby cat sleeping on soft blanket', 'animals'),
    (9, 'IMG009', 'food_pasta.jpg', 'Italian pasta dish with tomato sauce and basil', 'food'),
    (10, 'IMG010', 'food_pizza.jpg', 'Wood-fired pizza with fresh mozzarella', 'food');

Step 3: Generate Image Embeddings

Create embeddings from image descriptions.

UPDATE image_catalog
SET embedding = EMBED(description)
WHERE embedding IS NULL;

Step 4: Find Similar Images by Query

Search for images similar to a text query.

SELECT
    image_id,
    filename,
    description,
    category,
    COSINE_SIMILARITY(embedding, EMBED('sunset at the beach with waves')) as similarity
FROM image_catalog
WHERE embedding IS NOT NULL
ORDER BY similarity DESC
LIMIT 5;

Step 5: Find Similar Images to an Existing Image

Find images similar to a specific image in the database.

SELECT
    b.image_id,
    b.filename,
    b.description,
    COSINE_SIMILARITY(a.embedding, b.embedding) as similarity
FROM image_catalog a
CROSS JOIN image_catalog b
WHERE a.image_id = 'IMG001'
  AND b.image_id != 'IMG001'
  AND a.embedding IS NOT NULL
  AND b.embedding IS NOT NULL
ORDER BY similarity DESC
LIMIT 5;

Step 6: Create Similarity Pairs Table

Store pre-computed similarity scores.

CREATE TABLE image_similarity_pairs (
    id INTEGER PRIMARY KEY,
    image_id_1 VARCHAR(50) NOT NULL,
    image_id_2 VARCHAR(50) NOT NULL,
    similarity_score DECIMAL(6, 5),
    computed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Compute similarities for all pairs
INSERT INTO image_similarity_pairs (id, image_id_1, image_id_2, similarity_score)
SELECT
    ROW_NUMBER() OVER (),
    a.image_id,
    b.image_id,
    COSINE_SIMILARITY(a.embedding, b.embedding)
FROM image_catalog a
CROSS JOIN image_catalog b
WHERE a.id < b.id
  AND a.embedding IS NOT NULL
  AND b.embedding IS NOT NULL;

Step 7: Find Potential Duplicates

Identify images that may be duplicates.

SELECT
    isp.image_id_1,
    isp.image_id_2,
    isp.similarity_score,
    ic1.filename as file_1,
    ic2.filename as file_2
FROM image_similarity_pairs isp
INNER JOIN image_catalog ic1 ON isp.image_id_1 = ic1.image_id
INNER JOIN image_catalog ic2 ON isp.image_id_2 = ic2.image_id
WHERE isp.similarity_score > 0.85
ORDER BY isp.similarity_score DESC;

Step 8: Category-Based Similarity

Find similar images within the same category.

SELECT
    a.image_id as source_image,
    b.image_id as similar_image,
    a.category,
    COSINE_SIMILARITY(a.embedding, b.embedding) as similarity
FROM image_catalog a
INNER JOIN image_catalog b ON a.category = b.category AND a.id < b.id
WHERE a.embedding IS NOT NULL
  AND b.embedding IS NOT NULL
ORDER BY a.category, similarity DESC;

Step 9: Similarity Threshold Filtering

Find images above a similarity threshold.

SELECT
    image_id,
    filename,
    description,
    COSINE_SIMILARITY(embedding, EMBED('animals pets dogs cats')) as similarity
FROM image_catalog
WHERE embedding IS NOT NULL
  AND COSINE_SIMILARITY(embedding, EMBED('animals pets dogs cats')) > 0.5
ORDER BY similarity DESC;

Step 10: Image Clusters by Similarity

Identify natural groupings based on similarity.

SELECT
    ic.category,
    COUNT(*) as image_count,
    AVG(isp.similarity_score) as avg_internal_similarity
FROM image_catalog ic
INNER JOIN image_similarity_pairs isp ON
    (ic.image_id = isp.image_id_1 OR ic.image_id = isp.image_id_2)
INNER JOIN image_catalog ic2 ON
    (isp.image_id_1 = ic2.image_id OR isp.image_id_2 = ic2.image_id)
    AND ic.category = ic2.category
    AND ic.id != ic2.id
GROUP BY ic.category
ORDER BY avg_internal_similarity DESC;

Step 11: Similarity Distribution

Analyze the distribution of similarity scores.

SELECT
    CASE
        WHEN similarity_score >= 0.9 THEN '0.9-1.0 (Very Similar)'
        WHEN similarity_score >= 0.7 THEN '0.7-0.9 (Similar)'
        WHEN similarity_score >= 0.5 THEN '0.5-0.7 (Related)'
        WHEN similarity_score >= 0.3 THEN '0.3-0.5 (Somewhat Related)'
        ELSE '0.0-0.3 (Different)'
    END as similarity_range,
    COUNT(*) as pair_count
FROM image_similarity_pairs
GROUP BY
    CASE
        WHEN similarity_score >= 0.9 THEN '0.9-1.0 (Very Similar)'
        WHEN similarity_score >= 0.7 THEN '0.7-0.9 (Similar)'
        WHEN similarity_score >= 0.5 THEN '0.5-0.7 (Related)'
        WHEN similarity_score >= 0.3 THEN '0.3-0.5 (Somewhat Related)'
        ELSE '0.0-0.3 (Different)'
    END
ORDER BY similarity_range DESC;

Step 12: Recommend Similar Images

Build a simple recommendation query.

SELECT
    'Recommendations for ' || a.filename as recommendation_for,
    b.filename as recommended_image,
    b.category,
    COSINE_SIMILARITY(a.embedding, b.embedding) as relevance
FROM image_catalog a
CROSS JOIN image_catalog b
WHERE a.image_id = 'IMG001'
  AND b.image_id != a.image_id
  AND a.embedding IS NOT NULL
  AND b.embedding IS NOT NULL
ORDER BY relevance DESC
LIMIT 3;

Cleanup (Optional)

DROP TABLE IF EXISTS image_similarity_pairs;
DROP TABLE IF EXISTS image_catalog;

Expected Outcomes

  • Embeddings generated from descriptions
  • Cosine similarity finds related images
  • Duplicate detection identifies copies
  • Category-aware similarity works
  • Recommendation queries return relevant results

Similarity Score Guide

Score Interpretation Action
> 0.95 Near duplicate Review for removal
0.8-0.95 Very similar Strong recommendation
0.6-0.8 Related content Good recommendation
0.4-0.6 Some relation Weak recommendation
< 0.4 Different content No recommendation

Key Concepts Learned

  • COSINE_SIMILARITY function usage
  • Embedding-based image search
  • Duplicate detection patterns
  • Pre-computed similarity pairs
  • Threshold-based filtering

Tags

sqlintermediateimagessimilarityvectorscosineai

Run this on your own machine

Install SynapCores Community Edition free, paste the SQL or Cypher above into the bundled web UI, and watch it run.

Download Free CE