PDF Document Storage

Store and manage PDF documents with metadata for document management systems

All recipes· document-processing· 8 minutesbeginner

PDF Document Storage

Objective

Learn how to store PDF documents with metadata in SynapCores. This foundation enables building document management systems, digital archives, and searchable document repositories.

Step 1: Create Documents Table

Create a table for PDF storage.

CREATE TABLE pdf_documents (
    id INTEGER PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    document PDF,
    description TEXT,
    category VARCHAR(50),
    file_size BIGINT,
    page_count INTEGER,
    author VARCHAR(100),
    created_date DATE,
    uploaded_by VARCHAR(100),
    is_public BOOLEAN DEFAULT FALSE,
    version INTEGER DEFAULT 1,
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Step 2: Insert Sample Documents

Add sample PDF records.

INSERT INTO pdf_documents (id, title, description, category, file_size, page_count, author, created_date, uploaded_by, is_public) VALUES
    (1, 'Annual Report 2024', 'Company annual financial report', 'financial', 2457600, 45, 'Finance Team', '2024-01-15', 'finance_admin', TRUE),
    (2, 'Employee Handbook', 'Company policies and procedures', 'hr', 1536000, 32, 'HR Department', '2023-06-01', 'hr_admin', FALSE),
    (3, 'Product Specification', 'Technical specifications for Product X', 'technical', 768000, 15, 'Engineering', '2024-02-10', 'eng_lead', FALSE),
    (4, 'Marketing Plan Q1', 'Quarterly marketing strategy document', 'marketing', 1024000, 22, 'Marketing Team', '2024-01-05', 'marketing_mgr', FALSE),
    (5, 'Training Manual', 'New employee onboarding guide', 'training', 2048000, 58, 'Training Dept', '2023-09-15', 'training_admin', FALSE),
    (6, 'Legal Contract Template', 'Standard service agreement template', 'legal', 512000, 8, 'Legal Team', '2023-11-20', 'legal_admin', FALSE),
    (7, 'Research Summary', 'Industry research findings summary', 'research', 1280000, 28, 'Research Team', '2024-02-01', 'research_lead', TRUE);

Step 3: Query All Documents

Retrieve all documents with formatted information.

SELECT
    title,
    category,
    page_count,
    file_size / 1024 as size_kb,
    author,
    created_date
FROM pdf_documents
ORDER BY created_date DESC;

Step 4: Filter by Category

Find documents by category.

SELECT
    title,
    description,
    page_count,
    author
FROM pdf_documents
WHERE category = 'financial'
ORDER BY created_date DESC;

Step 5: Search by Title

Search documents by title keyword.

SELECT
    title,
    category,
    description,
    uploaded_by
FROM pdf_documents
WHERE title LIKE '%Report%' OR title LIKE '%Manual%'
ORDER BY title;

Step 6: Document Statistics

Calculate document library statistics.

SELECT
    COUNT(*) as total_documents,
    SUM(page_count) as total_pages,
    SUM(file_size) / 1048576 as total_size_mb,
    AVG(page_count) as avg_pages
FROM pdf_documents
WHERE status = 'active';

Step 7: Documents by Category

Group documents by category.

SELECT
    category,
    COUNT(*) as doc_count,
    SUM(page_count) as total_pages,
    SUM(file_size) / 1048576 as total_mb
FROM pdf_documents
GROUP BY category
ORDER BY doc_count DESC;

Step 8: Recent Documents

Get recently uploaded documents.

SELECT
    title,
    category,
    uploaded_by,
    created_at
FROM pdf_documents
ORDER BY created_at DESC
LIMIT 5;

Step 9: Large Documents

Find documents over a certain size.

SELECT
    title,
    page_count,
    file_size / 1024 as size_kb,
    category
FROM pdf_documents
WHERE file_size > 1048576
ORDER BY file_size DESC;

Step 10: Public Documents

Get documents marked as public.

SELECT
    title,
    description,
    author,
    created_date
FROM pdf_documents
WHERE is_public = TRUE
ORDER BY title;

Step 11: Update Document Metadata

Update document information.

UPDATE pdf_documents
SET description = 'Updated: Company annual financial report with Q4 results',
    version = 2
WHERE id = 1;

SELECT title, description, version FROM pdf_documents WHERE id = 1;

Step 12: Documents by Author

Find documents by specific author.

SELECT
    author,
    COUNT(*) as document_count,
    SUM(page_count) as total_pages
FROM pdf_documents
GROUP BY author
ORDER BY document_count DESC;

Cleanup (Optional)

DROP TABLE IF EXISTS pdf_documents;

Expected Outcomes

  • PDFs stored with metadata
  • Category filtering works
  • Search returns matches
  • Statistics calculated
  • Size queries function

Document Categories

Category Purpose
financial Financial reports
hr Human resources
technical Technical docs
marketing Marketing materials
legal Legal documents
training Training materials

Key Concepts Learned

  • PDF data type usage
  • Document metadata storage
  • Category organization
  • Size and page tracking
  • Basic document queries

Tags

sqlbeginnerpdfdocumentsstoragemetadata

Run this on your own machine

Install SynapCores Community Edition free, paste the SQL or Cypher above into the bundled web UI, and watch it run.

Download Free CE