Skip to content

Shandilya-Rishi/Comment-Toxicity-Detection-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Comment Toxicity Detection Model

An AI-powered machine learning system that automatically detects toxic comments on YouTube using Natural Language Processing (NLP) and multiple classification algorithms.

Acknowledgements

  • The dataset used is Jigsaw Toxic Comment Classification (Kaggle)
  • The dataset consists of 21,825+ comments categorized under 6 Labels : Toxic, Severe Toxic, Obscene, Threat, Insult, Identity Hate

Authors

πŸ”§ Technologies Used

  • Python 3.x
  • Pandas & NumPy - Data manipulation
  • Matplotlib & Seaborn - Data visualization
  • NLTK - Text preprocessing
  • Scikit-learn - Machine Learning
  • Google Colaboratory - Development environment

πŸš€ Project Pipeline

  1. Data Loading & Exploration

    • Load dataset and analyze distribution
    • Visualize toxicity patterns
  2. Data Preprocessing

    • Text cleaning (remove URLs, special characters)
    • Stopword removal
    • Tokenization
  3. Feature Engineering

    • TF-IDF vectorization
    • Convert text to numerical features
  4. Model Training

    • Logistic Regression
    • Random Forest Classifier
    • Naive Bayes
  5. Model Evaluation

    • Accuracy comparison
    • Confusion matrices
    • Performance metrics

πŸ“ˆ Results

  • Best Model: Logistic Regression
  • Accuracy: 95%+
  • Training Time: <10 seconds

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published