YouTube Comment Summarizer

Transform YouTube comment sections into structured insights using AI-powered analysis. Extract meaningful patterns, sentiment, and answers from thousands of comments automatically.

Features

Core Analysis

Comment Extraction: Fetches all comments and replies using YouTube Data API
AI Summarization: Generate comprehensive summaries of discussion themes
Sentiment Analysis: Classify comments as positive, negative, or neutral
Visual Analytics: Sentiment charts and word clouds
Vector Search: Semantic search through comment databases

Interactive Q&A System

Natural Language Queries: Ask questions about the video comments
Context-Aware Responses: Answers based on actual comment content
Adaptive Sampling: Intelligently selects relevant comments for accuracy
Multi-video Support: Maintains separate databases per video

Performance

Initial Analysis Time

Small videos (< 500 comments): 2-3 minutes
Medium videos (500-2000 comments): 3-5 minutes
Large videos (2000+ comments): 5-10+ minutes

Q&A Response Time

Comment retrieval: Milliseconds (vector search)
Answer generation: 30-60 seconds (LLM processing)
Total Q&A time: ~1 minute per question

Performance depends heavily on hardware specifications (GPU vs CPU).

Installation

Prerequisites

Python 3.8+
Ollama installed and running
YouTube Data API key (Get one here)
Hardware Recommendations:
- 8GB+ RAM (16GB for large comment sets)
- GPU with CUDA support (optional but significantly faster)
- 5GB+ free storage for models and vector databases

Setup

# Clone the repository
git clone https://github.com/yourusername/youtube-comment-summarizer
cd youtube-comment-summarizer

# Install dependencies
pip install -r requirements.txt

# Download required models
ollama pull llama3.2
ollama pull mxbai-embed-large

# Set up your API key in youtube_summary_tool_copy.py
# Replace: api_key="YOUR_API_KEY_HERE"

# Launch the application
python app.py

Usage

Web Interface

Start the server: python app.py
Navigate to http://localhost:5000
Enter YouTube URL and wait 3-5 minutes for analysis
Explore results in Summary, Sentiment, Word Cloud, and Q&A tabs

Example Questions

"What do people think about the music quality?"
"Are there any complaints about the video?"
"What are the most common suggestions?"
"How many people mentioned the graphics?"

Command Line

# Analyze a video
python youtube_summary_tool_copy.py "https://www.youtube.com/watch?v=VIDEO_ID"

# Ask specific questions
python youtube_summary_tool_copy.py "VIDEO_ID" --question "What are the main criticisms?"

# Custom sampling
python youtube_summary_tool_copy.py "VIDEO_ID" --question "Summarize reactions" --k 100

API Endpoints

Analyze Video

POST /api/analyze
Content-Type: application/json

{
  "youtube_url": "https://www.youtube.com/watch?v=VIDEO_ID"
}

Ask Questions

POST /api/ask
Content-Type: application/json

{
  "question": "What do people think about this?",
  "k": 50
}

Check Status

GET /api/status

How It Works

Initial Processing

When you submit a YouTube URL:

Comment Extraction (30-60s): Fetches all comments via YouTube API
Embedding Generation (1-2 min): Converts comments to vector representations
Database Storage (30s): Saves to ChromaDB for fast retrieval
Sentiment Analysis (1-2 min): AI analyzes emotional tone
Visualization (30s): Generates charts and word clouds
Summary Creation (1-2 min): LLM writes comprehensive summaries

Q&A Process

When you ask a question:

Vector Search (milliseconds): Finds comments most similar to your question
Context Building (seconds): Assembles relevant comments
LLM Processing (30-60s): Llama 3.2 synthesizes an answer
Response delivered with context and insights

Architecture

Web Interface ◄──► Flask Backend ◄──► YouTube API
     │                   │
     └───────────────────┼──────────────────┐
                         ▼                  ▼
                 Analysis Engine      ChromaDB
                         │           (Vector DB)
                    ┌────┼────┐
                    ▼    ▼    ▼
                Ollama HuggingFace NLTK
                (LLM) (Sentiment) (Text)

Configuration

Sentiment Analysis Models

Default: AmaanP314/youtube-xlm-roberta-base-sentiment-multilingual
Alternative: VADER (available in youtube_summary_tool.py)

Sample Size Optimization

The system automatically calculates optimal sample sizes:

<50 comments: 60-70% of total
50-200 comments: 30-40% of total
200-1000 comments: 20-30% of total
1000+ comments: 10-20% with caps at 600

Output Files

Each analysis creates:

chroma/
└── VIDEO_ID/
    ├── sentiment_pie_chart.png
    ├── comment_wordcloud.png
    ├── overall_summary.txt
    ├── sentiment_summary.txt
    └── video_metadata.json

Limitations

Processing Time: Initial analysis takes 3-10 minutes depending on hardware
API Limits: YouTube API has daily quotas (10,000 requests/day by default)
Hardware Dependency: Performance heavily depends on GPU/CPU specifications
Memory Usage: Large videos (5k+ comments) require significant RAM
Language: Optimized for English, supports multilingual content
Dependencies: Requires local Ollama installation and model downloads

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit changes: git commit -m 'Add amazing feature'
Push to branch: git push origin feature/amazing-feature
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
__pycache__		__pycache__
chroma		chroma
static		static
templates		templates
.DS_Store		.DS_Store
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
comment_wordcloud.png		comment_wordcloud.png
fly.toml		fly.toml
overall_summary.txt		overall_summary.txt
qa_test.py		qa_test.py
requirements.txt		requirements.txt
sentiment_pie_chart.png		sentiment_pie_chart.png
sentiment_summary.txt		sentiment_summary.txt
sqlite_patch.py		sqlite_patch.py
streamlit_app.py		streamlit_app.py
youtube_summary_tool.py		youtube_summary_tool.py
youtube_summary_tool_copy.py		youtube_summary_tool_copy.py

Kyltetran/nlp-comment-summary

Folders and files

Latest commit

History

Repository files navigation

YouTube Comment Summarizer

Features

Core Analysis

Interactive Q&A System

Performance

Initial Analysis Time

Q&A Response Time

Installation

Prerequisites

Setup

Usage

Web Interface

Example Questions

Command Line

API Endpoints

Analyze Video

Ask Questions

Check Status

How It Works

Initial Processing

Q&A Process

Architecture

Configuration

Sentiment Analysis Models

Sample Size Optimization

Output Files

Limitations

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages