Transform YouTube comment sections into structured insights using AI-powered analysis. Extract meaningful patterns, sentiment, and answers from thousands of comments automatically.
- Comment Extraction: Fetches all comments and replies using YouTube Data API
- AI Summarization: Generate comprehensive summaries of discussion themes
- Sentiment Analysis: Classify comments as positive, negative, or neutral
- Visual Analytics: Sentiment charts and word clouds
- Vector Search: Semantic search through comment databases
- Natural Language Queries: Ask questions about the video comments
- Context-Aware Responses: Answers based on actual comment content
- Adaptive Sampling: Intelligently selects relevant comments for accuracy
- Multi-video Support: Maintains separate databases per video
- Small videos (< 500 comments): 2-3 minutes
- Medium videos (500-2000 comments): 3-5 minutes
- Large videos (2000+ comments): 5-10+ minutes
- Comment retrieval: Milliseconds (vector search)
- Answer generation: 30-60 seconds (LLM processing)
- Total Q&A time: ~1 minute per question
Performance depends heavily on hardware specifications (GPU vs CPU).
- Python 3.8+
- Ollama installed and running
- YouTube Data API key (Get one here)
- Hardware Recommendations:
- 8GB+ RAM (16GB for large comment sets)
- GPU with CUDA support (optional but significantly faster)
- 5GB+ free storage for models and vector databases
# Clone the repository
git clone https://github.com/yourusername/youtube-comment-summarizer
cd youtube-comment-summarizer
# Install dependencies
pip install -r requirements.txt
# Download required models
ollama pull llama3.2
ollama pull mxbai-embed-large
# Set up your API key in youtube_summary_tool_copy.py
# Replace: api_key="YOUR_API_KEY_HERE"
# Launch the application
python app.py
- Start the server:
python app.py
- Navigate to
http://localhost:5000
- Enter YouTube URL and wait 3-5 minutes for analysis
- Explore results in Summary, Sentiment, Word Cloud, and Q&A tabs
- "What do people think about the music quality?"
- "Are there any complaints about the video?"
- "What are the most common suggestions?"
- "How many people mentioned the graphics?"
# Analyze a video
python youtube_summary_tool_copy.py "https://www.youtube.com/watch?v=VIDEO_ID"
# Ask specific questions
python youtube_summary_tool_copy.py "VIDEO_ID" --question "What are the main criticisms?"
# Custom sampling
python youtube_summary_tool_copy.py "VIDEO_ID" --question "Summarize reactions" --k 100
POST /api/analyze
Content-Type: application/json
{
"youtube_url": "https://www.youtube.com/watch?v=VIDEO_ID"
}
POST /api/ask
Content-Type: application/json
{
"question": "What do people think about this?",
"k": 50
}
GET /api/status
When you submit a YouTube URL:
- Comment Extraction (30-60s): Fetches all comments via YouTube API
- Embedding Generation (1-2 min): Converts comments to vector representations
- Database Storage (30s): Saves to ChromaDB for fast retrieval
- Sentiment Analysis (1-2 min): AI analyzes emotional tone
- Visualization (30s): Generates charts and word clouds
- Summary Creation (1-2 min): LLM writes comprehensive summaries
When you ask a question:
- Vector Search (milliseconds): Finds comments most similar to your question
- Context Building (seconds): Assembles relevant comments
- LLM Processing (30-60s): Llama 3.2 synthesizes an answer
- Response delivered with context and insights
Web Interface ◄──► Flask Backend ◄──► YouTube API
│ │
└───────────────────┼──────────────────┐
▼ ▼
Analysis Engine ChromaDB
│ (Vector DB)
┌────┼────┐
▼ ▼ ▼
Ollama HuggingFace NLTK
(LLM) (Sentiment) (Text)
- Default:
AmaanP314/youtube-xlm-roberta-base-sentiment-multilingual
- Alternative: VADER (available in
youtube_summary_tool.py
)
The system automatically calculates optimal sample sizes:
- <50 comments: 60-70% of total
- 50-200 comments: 30-40% of total
- 200-1000 comments: 20-30% of total
- 1000+ comments: 10-20% with caps at 600
Each analysis creates:
chroma/
└── VIDEO_ID/
├── sentiment_pie_chart.png
├── comment_wordcloud.png
├── overall_summary.txt
├── sentiment_summary.txt
└── video_metadata.json
- Processing Time: Initial analysis takes 3-10 minutes depending on hardware
- API Limits: YouTube API has daily quotas (10,000 requests/day by default)
- Hardware Dependency: Performance heavily depends on GPU/CPU specifications
- Memory Usage: Large videos (5k+ comments) require significant RAM
- Language: Optimized for English, supports multilingual content
- Dependencies: Requires local Ollama installation and model downloads
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open a Pull Request