Podcast Search by Transcript

Search podcast content via transcribed text for discovering episodes by spoken content

All recipes· audio-podcasts· 12 minutesintermediate

Podcast Search by Transcript

Objective

Enable searching podcast episodes by their transcribed content. This allows listeners to find episodes discussing specific topics, even when the topic isn't in the title.

Step 1: Create Podcast Episodes Table

Create a table for podcast episodes with transcripts.

CREATE TABLE searchable_podcasts (
    id INTEGER PRIMARY KEY,
    show_name VARCHAR(255) NOT NULL,
    episode_number INTEGER,
    title VARCHAR(255) NOT NULL,
    audio_file AUDIO(MP3),
    duration_seconds INTEGER,
    publish_date DATE,
    transcript TEXT,
    transcript_summary TEXT,
    topics TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Step 2: Insert Sample Podcast Data

Add sample podcast episodes with transcripts.

INSERT INTO searchable_podcasts (id, show_name, episode_number, title, duration_seconds, publish_date, transcript, topics) VALUES
(1, 'Tech Today', 101, 'The Rise of AI Assistants',
    2400, '2024-01-15',
    'Today we discuss artificial intelligence and how AI assistants are changing the workplace. Machine learning models are becoming more sophisticated. Companies are investing heavily in natural language processing. The future of work will be transformed by these technologies.',
    'AI, machine learning, workplace, NLP'),

(2, 'Tech Today', 102, 'Cloud Computing Trends',
    2700, '2024-01-22',
    'Cloud infrastructure continues to evolve. Amazon Web Services and Microsoft Azure dominate the market. Kubernetes has become the standard for container orchestration. Serverless computing is gaining popularity for its cost efficiency.',
    'cloud, AWS, Azure, Kubernetes, serverless'),

(3, 'Business Hour', 50, 'Startup Funding Strategies',
    1800, '2024-01-10',
    'Raising capital for your startup requires careful planning. Venture capital firms look for scalable business models. Angel investors often provide early stage funding. Crowdfunding platforms have opened new opportunities for entrepreneurs.',
    'startup, funding, venture capital, angel investors'),

(4, 'Business Hour', 51, 'Remote Work Best Practices',
    2100, '2024-01-17',
    'Managing remote teams effectively requires clear communication. Video conferencing tools like Zoom have become essential. Asynchronous communication helps global teams collaborate. Work life balance is crucial for remote employee productivity.',
    'remote work, teams, communication, productivity'),

(5, 'Health Talk', 25, 'Nutrition and Mental Health',
    1500, '2024-01-12',
    'Diet affects mental well being more than many realize. Omega three fatty acids support brain function. Gut health is linked to mood and cognitive performance. Mediterranean diet has shown positive effects on mental health.',
    'nutrition, mental health, diet, brain'),

(6, 'Tech Today', 103, 'Cybersecurity Essentials',
    2200, '2024-01-29',
    'Protecting your digital assets requires multiple layers of security. Two factor authentication should be mandatory. Phishing attacks remain the most common threat. Regular security audits help identify vulnerabilities before attackers do.',
    'security, authentication, phishing, cyber');

Step 3: Search by Topic Keywords

Find episodes discussing specific topics.

SELECT
    show_name,
    title,
    episode_number,
    publish_date,
    topics
FROM searchable_podcasts
WHERE transcript LIKE '%artificial intelligence%'
   OR transcript LIKE '%AI%'
   OR transcript LIKE '%machine learning%'
ORDER BY publish_date DESC;

Step 4: Full-Text Topic Search

Search transcripts for business topics.

SELECT
    show_name,
    title,
    duration_seconds / 60 as duration_minutes,
    topics
FROM searchable_podcasts
WHERE transcript LIKE '%startup%'
   OR transcript LIKE '%funding%'
   OR transcript LIKE '%venture capital%'
ORDER BY publish_date DESC;

Step 5: Cross-Show Search

Find content across all shows.

SELECT
    show_name,
    title,
    episode_number,
    topics,
    'Match found' as status
FROM searchable_podcasts
WHERE transcript LIKE '%communication%'
ORDER BY show_name, episode_number;

Step 6: Create Episode Segments Table

Track specific segments within episodes.

CREATE TABLE episode_segments (
    id INTEGER PRIMARY KEY,
    episode_id INTEGER NOT NULL,
    segment_title VARCHAR(255),
    start_seconds INTEGER,
    end_seconds INTEGER,
    segment_transcript TEXT,
    keywords TEXT,
    FOREIGN KEY (episode_id) REFERENCES searchable_podcasts(id)
);

INSERT INTO episode_segments (id, episode_id, segment_title, start_seconds, end_seconds, segment_transcript, keywords) VALUES
    (1, 1, 'Introduction to AI', 0, 300, 'Today we discuss artificial intelligence and how AI assistants are changing the workplace.', 'AI, introduction'),
    (2, 1, 'Machine Learning Deep Dive', 300, 1200, 'Machine learning models are becoming more sophisticated. Deep learning has enabled breakthrough applications.', 'machine learning, deep learning'),
    (3, 1, 'Future Predictions', 1200, 2400, 'Companies are investing heavily in natural language processing. The future of work will be transformed.', 'NLP, future, predictions'),
    (4, 2, 'Cloud Market Overview', 0, 600, 'Cloud infrastructure continues to evolve. Amazon Web Services and Microsoft Azure dominate.', 'cloud, AWS, Azure'),
    (5, 2, 'Container Technology', 600, 1500, 'Kubernetes has become the standard for container orchestration.', 'Kubernetes, containers');

Step 7: Search Within Segments

Find specific segments about topics.

SELECT
    sp.show_name,
    sp.title as episode_title,
    es.segment_title,
    es.start_seconds / 60 as start_minute,
    es.keywords
FROM episode_segments es
INNER JOIN searchable_podcasts sp ON es.episode_id = sp.id
WHERE es.segment_transcript LIKE '%Kubernetes%'
   OR es.segment_transcript LIKE '%container%';

Step 8: Topic Frequency Analysis

Analyze which topics are discussed most.

SELECT
    show_name,
    COUNT(*) as episode_count,
    SUM(CASE WHEN transcript LIKE '%AI%' OR transcript LIKE '%artificial intelligence%' THEN 1 ELSE 0 END) as ai_episodes,
    SUM(CASE WHEN transcript LIKE '%cloud%' THEN 1 ELSE 0 END) as cloud_episodes,
    SUM(CASE WHEN transcript LIKE '%security%' THEN 1 ELSE 0 END) as security_episodes
FROM searchable_podcasts
GROUP BY show_name;

Step 9: Related Episodes

Find episodes related to a specific episode.

SELECT
    b.title as related_episode,
    b.show_name,
    b.topics,
    b.publish_date
FROM searchable_podcasts a
CROSS JOIN searchable_podcasts b
WHERE a.id = 1
  AND b.id != a.id
  AND (
    b.topics LIKE '%AI%'
    OR b.topics LIKE '%machine learning%'
    OR b.transcript LIKE '%artificial intelligence%'
  )
ORDER BY b.publish_date DESC;

Step 10: Search Results with Context

Show matching transcript excerpts.

SELECT
    show_name,
    title,
    episode_number,
    transcript as excerpt,
    publish_date
FROM searchable_podcasts
WHERE transcript LIKE '%security%'
ORDER BY publish_date DESC;

Step 11: Episode Discovery Stats

Track which episodes get found via search.

CREATE TABLE search_analytics (
    id INTEGER PRIMARY KEY,
    search_term VARCHAR(255),
    episode_id INTEGER,
    searched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO search_analytics (id, search_term, episode_id) VALUES
    (1, 'AI', 1),
    (2, 'cloud computing', 2),
    (3, 'startup funding', 3),
    (4, 'remote work', 4),
    (5, 'AI', 1),
    (6, 'machine learning', 1);

SELECT
    sp.title,
    COUNT(sa.id) as search_hits,
    sa.search_term
FROM search_analytics sa
INNER JOIN searchable_podcasts sp ON sa.episode_id = sp.id
GROUP BY sp.id, sp.title, sa.search_term
ORDER BY search_hits DESC;

Step 12: Trending Topics

Find currently popular search topics.

SELECT
    search_term,
    COUNT(*) as search_count
FROM search_analytics
GROUP BY search_term
ORDER BY search_count DESC
LIMIT 10;

Cleanup (Optional)

DROP TABLE IF EXISTS search_analytics;
DROP TABLE IF EXISTS episode_segments;
DROP TABLE IF EXISTS searchable_podcasts;

Expected Outcomes

  • Transcripts enable content search
  • Cross-show search finds all matches
  • Segment-level search provides timestamps
  • Topic analysis reveals trends
  • Related episodes discovered via content

Key Concepts Learned

  • Transcript-based podcast search
  • Episode segmentation
  • Topic frequency analysis
  • Related content discovery
  • Search analytics tracking

Tags

sqlintermediateaudiopodcastssearchtranscriptionai

Run this on your own machine

Install SynapCores Community Edition free, paste the SQL or Cypher above into the bundled web UI, and watch it run.

Download Free CE