Audio Transcription

Objective

Use AIDB's TRANSCRIBE function to convert audio files to searchable text. This enables full-text search across audio content, accessibility features, and content analysis.

Step 1: Create Audio with Transcription Table

Create a table for audio files and their transcriptions.

CREATE TABLE audio_transcriptions (
    id INTEGER PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    audio_file AUDIO(MP3),
    audio_type VARCHAR(50),
    duration_seconds INTEGER,
    transcript TEXT,
    transcript_confidence DECIMAL(5, 4),
    word_count INTEGER,
    language VARCHAR(10) DEFAULT 'en',
    transcribed_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Step 2: Insert Sample Audio Records

Add sample audio records for transcription.

INSERT INTO audio_transcriptions (id, title, audio_type, duration_seconds) VALUES
    (1, 'CEO Quarterly Update', 'speech', 1200),
    (2, 'Product Demo Recording', 'presentation', 900),
    (3, 'Customer Support Call #1234', 'call', 480),
    (4, 'Team Standup Meeting', 'meeting', 600),
    (5, 'Podcast Episode 42', 'podcast', 2700),
    (6, 'Training Video Narration', 'narration', 1800),
    (7, 'Voice Message from Client', 'voicemail', 120),
    (8, 'Interview - John Smith', 'interview', 3600);

Step 3: Transcribe Audio Files

Use TRANSCRIBE function to extract text.

UPDATE audio_transcriptions
SET transcript = TRANSCRIBE(audio_file),
    transcribed_at = CURRENT_TIMESTAMP
WHERE audio_file IS NOT NULL AND transcript IS NULL;

Step 4: Transcribe with Specific Model

Use Whisper model for transcription.

UPDATE audio_transcriptions
SET transcript = TRANSCRIBE(audio_file, 'whisper-base'),
    transcribed_at = CURRENT_TIMESTAMP
WHERE id = 1;

Step 5: Add Sample Transcripts

Insert sample transcript content for demonstration.

UPDATE audio_transcriptions SET transcript = 'Good morning everyone. I want to share our quarterly results. Revenue increased by 15 percent compared to last quarter. Our customer base grew to over 50,000 active users. The engineering team shipped three major features.', word_count = 38, transcript_confidence = 0.9523 WHERE id = 1;

UPDATE audio_transcriptions SET transcript = 'Welcome to our product demo. Today I will show you the new dashboard features. First, lets look at the analytics panel. You can see real-time metrics displayed here. The chart updates automatically every five seconds.', word_count = 42, transcript_confidence = 0.9412 WHERE id = 2;

UPDATE audio_transcriptions SET transcript = 'Hi, I am calling about my order number 12345. It was supposed to arrive yesterday but I have not received it yet. Can you check the shipping status please? I really need it by Friday.', word_count = 38, transcript_confidence = 0.9234 WHERE id = 3;

UPDATE audio_transcriptions SET transcript = 'Okay team, lets go through our updates. Sarah, what did you work on yesterday? I finished the API integration and started testing. Great. John, how about you? I am still debugging the login issue but should have it fixed today.', word_count = 45, transcript_confidence = 0.8956 WHERE id = 4;

Step 6: Search Transcript Content

Search across all transcribed audio.

SELECT
    title,
    audio_type,
    transcript
FROM audio_transcriptions
WHERE transcript LIKE '%customer%'
   OR transcript LIKE '%order%';

Step 7: Full-Text Search Query

Find audio containing specific terms.

SELECT
    title,
    audio_type,
    duration_seconds / 60 as duration_minutes,
    word_count,
    transcript_confidence
FROM audio_transcriptions
WHERE transcript LIKE '%quarterly%'
   OR transcript LIKE '%revenue%'
   OR transcript LIKE '%results%'
ORDER BY transcript_confidence DESC;

Step 8: Transcription Quality Report

Analyze transcription quality.

SELECT
    audio_type,
    COUNT(*) as total_files,
    COUNT(transcript) as transcribed_count,
    AVG(transcript_confidence) as avg_confidence,
    AVG(word_count) as avg_words
FROM audio_transcriptions
GROUP BY audio_type
ORDER BY avg_confidence DESC;

Step 9: Low Confidence Transcripts

Find transcripts needing review.

SELECT
    id,
    title,
    audio_type,
    transcript_confidence,
    'Needs review' as status
FROM audio_transcriptions
WHERE transcript_confidence < 0.90
  AND transcript IS NOT NULL
ORDER BY transcript_confidence;

Step 10: Pending Transcriptions

Find audio files not yet transcribed.

SELECT
    id,
    title,
    audio_type,
    duration_seconds / 60 as duration_minutes,
    created_at
FROM audio_transcriptions
WHERE transcript IS NULL
ORDER BY duration_seconds DESC;

Step 11: Word Count Analysis

Analyze transcript lengths.

SELECT
    audio_type,
    AVG(word_count) as avg_words,
    AVG(duration_seconds) as avg_duration,
    AVG(word_count * 60.0 / duration_seconds) as words_per_minute
FROM audio_transcriptions
WHERE word_count > 0 AND duration_seconds > 0
GROUP BY audio_type
ORDER BY words_per_minute DESC;

Step 12: Create Searchable Index

Build an index for faster searching.

CREATE TABLE transcript_search_index (
    id INTEGER PRIMARY KEY,
    audio_id INTEGER NOT NULL,
    title VARCHAR(255),
    audio_type VARCHAR(50),
    full_text TEXT,
    keywords TEXT,
    indexed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (audio_id) REFERENCES audio_transcriptions(id)
);

INSERT INTO transcript_search_index (id, audio_id, title, audio_type, full_text)
SELECT
    id,
    id,
    title,
    audio_type,
    transcript
FROM audio_transcriptions
WHERE transcript IS NOT NULL;

Cleanup (Optional)

DROP TABLE IF EXISTS transcript_search_index;
DROP TABLE IF EXISTS audio_transcriptions;

Expected Outcomes

TRANSCRIBE extracts text from audio
Full-text search enabled on audio
Confidence scores track quality
Low quality transcripts flagged
Word-per-minute analysis works

Transcription Use Cases

Audio Type	Use Case
Speech	Accessibility, archival
Meeting	Minutes, action items
Call	Quality assurance, search
Podcast	Show notes, SEO
Interview	Documentation

Key Concepts Learned

TRANSCRIBE function usage
Whisper model integration
Confidence score tracking
Full-text search on audio
Quality assurance queries

Audio Transcription

Audio Transcription

Objective

Step 1: Create Audio with Transcription Table

Step 2: Insert Sample Audio Records

Step 3: Transcribe Audio Files

Step 4: Transcribe with Specific Model

Step 5: Add Sample Transcripts

Step 6: Search Transcript Content

Step 7: Full-Text Search Query

Step 8: Transcription Quality Report

Step 9: Low Confidence Transcripts

Step 10: Pending Transcriptions

Step 11: Word Count Analysis

Step 12: Create Searchable Index

Cleanup (Optional)

Expected Outcomes

Transcription Use Cases

Key Concepts Learned

Run this on your own machine