Comment Toxicity Detection Model

An AI-powered machine learning system that automatically detects toxic comments on YouTube using Natural Language Processing (NLP) and multiple classification algorithms.

Acknowledgements

The dataset used is Jigsaw Toxic Comment Classification (Kaggle)
The dataset consists of 21,825+ comments categorized under 6 Labels : Toxic, Severe Toxic, Obscene, Threat, Insult, Identity Hate

Authors

@rishi shandilya

🔧 Technologies Used

Python 3.x
Pandas & NumPy - Data manipulation
Matplotlib & Seaborn - Data visualization
NLTK - Text preprocessing
Scikit-learn - Machine Learning
Google Colaboratory - Development environment

🚀 Project Pipeline

Data Loading & Exploration
- Load dataset and analyze distribution
- Visualize toxicity patterns
Data Preprocessing
- Text cleaning (remove URLs, special characters)
- Stopword removal
- Tokenization
Feature Engineering
- TF-IDF vectorization
- Convert text to numerical features
Model Training
- Logistic Regression
- Random Forest Classifier
- Naive Bayes
Model Evaluation
- Accuracy comparison
- Confusion matrices
- Performance metrics

📈 Results

Best Model: Logistic Regression
Accuracy: 95%+
Training Time: <10 seconds

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Project.ipynb		Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Comment Toxicity Detection Model

Acknowledgements

Authors

🔧 Technologies Used

🚀 Project Pipeline

📈 Results

About

Uh oh!

Releases

Packages

Languages

Shandilya-Rishi/Comment-Toxicity-Detection-System

Folders and files

Latest commit

History

Repository files navigation

Comment Toxicity Detection Model

Acknowledgements

Authors

🔧 Technologies Used

🚀 Project Pipeline

📈 Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages