100% found this document useful (1 vote)

163 views

Machine Learning Project Report

This document summarizes a machine learning project that classified tweets by gender. It addressed two questions: 1) The most common emotions/words used by each gender and 2) Which gender makes more typos. For question 1, word clouds showed males commonly used words like "make" and "know" while females used words like "need" and "best". For question 2, a bar graph showed females made slightly more typos than males, with about 2,862 typos for females and 2,702 for males. The project tested three algorithms on tweet classifications and found Multinomial Naive Bayes had the best accuracy at 60.1%.

Uploaded by

Ashish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

163 views

Machine Learning Project Report

Uploaded by

Ashish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

MACHINE LEARNING PROJECT REPORT

PROJECT TITLE

CLASSIFICATION OF TWEETS ACCORDING TO GENDER

The data set provided was about the tweets which were classified between males and
females and it came with a couple of questions which were to be answered.
Questions based on the data set:

1. What are the most common emotions/words used by Males and

Females?

Solution: As after the cleaning, analysis and visualization, it was clear that the most
common emotions/words used by males are

 Make  Know
 Go  See
 Day  Time
 Good  Want
 Amp  People
 Love  Need
 Back  Think
 New  Best
 One  Got

We displayed this in the form of a word cloud which is given below.

And about the most common words used by females in their tweets, those are the
following:

 Make  One
 Need  Best
 Amp  Got
 Time  Go
 Good  People
 Last  Love
 New  Thing
 Day  Want
 Know  Back

We can see all these quite evidently in the word cloud following.
2. Which gender makes more typos in their tweets?

Solution: By using the spellchecker package, we found out the number of typos done
by each gender in this particular set of data.
We got the results and presented in the form of a bar graph, which is shown below:

So as one can see clearly that with just a slight margin, the result is that females
make more typos in their tweets.
And to be precise with the values, the males in this particular data set made about
2702 typos whereas females made about 2862 typos in their tweets.

Now coming to the detail summary of the project:

We were told to take up three classification algorithms of our own choice and build
three respective Machine learning models and compare the Accuracy of all three and
suggest which ML algorithm suits best for the given problem.

So to reach the final conclusion, we did data encoding and exploration.

 The first approach which we went ahead with is taking the ‘Description column’
as the independent variable and the ‘Gender column’ as the dependent variable
(As given).
Then we converted the descriptions which are originally of string type into an
array of numbers before giving it to the ML Model.
Then we split the encoded data into train and test data.

Now comes the Ensemble Machine learning modelling which is nothing but the
Classification Algorithms.

The Classification Algorithms which we used in this are

 RandomForestClassifier
 Logistic Regression
 Multinomial Naïve Byes

So after performing the training and testing, the accuracy of the model by all
three of these algorithms are
 RandomForestClassifier - 57.2 %( approx.)
 Logistic Regression - 57.8 %( approx.)
 Multinomial Naïve Byes - 60.1 %( approx.)

So, after coming the three models, Multinomial Naïve Byes is giving us the better
accuracy rate than the other models in case of description as independent variable
and gender as dependent variable.

 The second approach which we went ahead with is taking the ‘Tweets column’ as
the independent variable and the ‘Gender column’ as the dependent variable (As
given).
Then we converted the tweets which are originally of string type into an array of
numbers before giving it to the ML Model.
Then we split the encoded data into train and test data.

So after performing the training and testing, the accuracy of the model by all
three of these algorithms are

 RandomForestClassifier - 50.2 %( approx.)

 Logistic Regression - 50.6 %( approx.)
 Multinomial Naïve Byes - 52.0 %( approx.)

So, after coming the three models, Multinomial Naïve Byes is giving us the better
accuracy rate than the other models in case of tweets as independent variable and
gender as dependent variable.

CONCLUSION:
So in both cases, i.e., by taking Descriptions in one and Tweets in other case as the
independent variables and Gender being the fixed dependent variable, it came out
very clearly that Multinomial Naïve Byes Classification Algorithm is the best
suited in terms of accuracy.

Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Classification
100% (2)
Classification
105 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
ML Project Shivani Pandey
100% (2)
ML Project Shivani Pandey
49 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Classification Algorithms
100% (2)
Classification Algorithms
23 pages
Poly
100% (1)
Poly
108 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
100% (1)
Cluster Analysis: Concepts and Techniques - Chapter 7
60 pages
Import As
100% (1)
Import As
27 pages
Churn For Bank Customers
No ratings yet
Churn For Bank Customers
28 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
HW1
100% (1)
HW1
8 pages
Data Science
No ratings yet
Data Science
39 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Pattern Classification
100% (1)
Pattern Classification
42 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
LDA KNN Logistic
100% (1)
LDA KNN Logistic
29 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Feature Selection Techniques in ML With Python-1
No ratings yet
Feature Selection Techniques in ML With Python-1
7 pages
Tutorial 2 - Clustering
100% (2)
Tutorial 2 - Clustering
6 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Logistics Regression
100% (1)
Logistics Regression
5 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Building A Python Package in Minutes - Analytics Vidhya - Medium
No ratings yet
Building A Python Package in Minutes - Analytics Vidhya - Medium
23 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
TP Regression
100% (1)
TP Regression
1 page
K Means
100% (2)
K Means
329 pages
ML Interview Questions and Answers
100% (1)
ML Interview Questions and Answers
25 pages
Final Twitter - Sentiment - Analysis - Report
100% (1)
Final Twitter - Sentiment - Analysis - Report
14 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
ML Notes
100% (2)
ML Notes
125 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Eda PDF
100% (1)
Eda PDF
45 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
The Cricket Winner Prediction With Applications of ML and Data Analytics
No ratings yet
The Cricket Winner Prediction With Applications of ML and Data Analytics
18 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
Pandas
100% (1)
Pandas
1,131 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
Predictive Modelling - Linear Discriminant Analysis - Mentor Version - Jupyter Notebook
100% (1)
Predictive Modelling - Linear Discriminant Analysis - Mentor Version - Jupyter Notebook
25 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet

Machine Learning Project Report

Uploaded by

Machine Learning Project Report

Uploaded by

MACHINE LEARNING PROJECT REPORT

CLASSIFICATION OF TWEETS ACCORDING TO GENDER

1. What are the most common emotions/words used by Males and

We displayed this in the form of a word cloud which is given below.

Now coming to the detail summary of the project:

So to reach the final conclusion, we did data encoding and exploration.

The Classification Algorithms which we used in this are

 RandomForestClassifier - 50.2 %( approx.)

You might also like