0% found this document useful (0 votes)

78 views17 pages

Predicting Mode of Transport (ML) : Akalya KS

KNN model with k=5 Accuracy: 0.893 Kappa: 0.7457 Classification rates: 0 1 0 0.88 0.12 1 0.15 0.85 Interpretation of KNN Model: - The accuracy of KNN model is 89.3% which is similar to logistic regression. - Kappa value of 0.7457 indicates good agreement between predicted and actual values. - Sensitivity for class 1 is 85% which is higher than logistic regression. Specificity is 88% which is lower than logistic regression. - Overall the KNN model is performing decently well in predicting the mode of transport. Distance of nearest

Uploaded by

student login

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views17 pages

Predicting Mode of Transport (ML) : Akalya KS

Uploaded by

student login

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Predicting Mode of Transport (ML)

Akalya KS

1
Table of Contents
1 Project Objective......................................................................................................................................4
2 Exploratory Data Analysis.........................................................................................................................4
2.1 EDA - Basic data summary, Univariate, Bivariate analysis, graphs, Check for Outliers and missing
values and check the summary of the dataset........................................................................................4
Exploratory Data Analysis:.................................................................................................................4
Univariate analysis:..............................................................................................................................5
Bivariate analysis:................................................................................................................................7
Missing values and outliers:.................................................................................................................8
Multi Collinerarity:...............................................................................................................................8
3.Data preparation and SMOTE:..............................................................................................................9
SMOTE:..............................................................................................................................................10
4.Building models:.................................................................................................................................10
4.1.Logistic regression models:..........................................................................................................10
4.2.KNN Model:.................................................................................................................................12
Interpretation of KNN Model:............................................................................................................13
4.3. Applying Naive Bayes Model:...................................................................................................13
Interpretation of Naïve Bayes model:................................................................................................14
4.4.Confusion matrix interpretation:.................................................................................................15
Boosting and Bagging models:...............................................................................................................15
Applying bagging model:...................................................................................................................15
Applying Boosting:.............................................................................................................................16
Actionable insights and recommendations:..........................................................................................17

2
1 Project Objective

In this project ,we will have to study the preference of the transport which employees prefers to
commute to their office.
We need to predict whether or not an employee will use Car as a mode of transport. The
objective is to build various Machine learning models to identify the preference.

2 Exploratory Data Analysis

2.1 EDA - Basic data summary, Univariate, Bivariate analysis, graphs, Check for Outliers
and missing values and check the summary of the dataset

Exploratory Data Analysis:

There are 9 variables in the dataset with 444 records.

Data summary:

3
The summary of the data shows that target variable transport is a 3 class variable such as 2 wheeler,car
and public transport.

The percentage distribution of transport variables as below:

4
Univariate analysis:

Insights:

Analysis shows that columns Engineer,MBA,license are behaving like categorical variables and hence can
be converted to factors.

5
Bivariate analysis:

6
Insights:

 Age & Transport: Plot shows that higher the age.mode of transport is car.

7
 Gender: Female prefer 2-wheeler more when compared to car and public transport.Very
few female prefer car than public transport.Majorly 2 wheeler and public transport is used
by female.
 Engineer: There is no significant difference due to engineer.
 MBA:Public transport is preferred by non-MBA when compared to MBA.
 License: People with no license are using 2-wheelers more than license people.Car is
prefered by people with license more even though people without license is using both
car and 2-wheeler.Public transport is dominated by people with no license.
 Work experience: People with work experience of more than 15 years is using cars.More
experience leads to more usage of cars.
 Salary: Higher the salary,people prefer cars.2-wheeler and public transport is preferred by
people with low salary.
 Distance: Car is preferred for longer distance.

Bivariate analysis shows age,salary,work experience and distance contributes to the usage
of cars.They are the factors which will help in prediction.

Missing values and outliers:

There was only one NA value in MBA column which was treated using Knn imputation.

Outliers are actually the real data collected which we will not treat since they will help in
predicting models.

Multi Collinerarity:

8
Insights:

The multicollinerarity plot shows that work experience.age,salary are highly correlated.

3.Data preparation and SMOTE:

Since we are going to prepare models to undeestand the factors influence the car usage ,we will need to
understand the proportion of cars being used in the data.Hence,we will convert the 3 class Tranpsort
variable to 2- class variables where car will take 1 , 2-wheeler and public transport will take 0. We will
store this in a new column as ‘Transport usage’.

The publictransport and 2-wheeler is 86.2% and car is used at 13.7% in the given dataset.

The proportion of car and other transport data is imbalanced and we will do SMOTE to balance the data
before building models.

We will split the data into train and test dataset where SMOTE is applied to only train dataset.

SMOTE:

9
After balancing the data using SMOTE,we can see more than 10% increase in data which we will use for
building models such as Logistic regression, Knn and Naïve Bayes model.

4.Building models:

4.1.Logistic regression models:

Confusion Matrix and Statistics

Reference
Prediction 0 1
0 260 29
1 14 100

10
Accuracy : 0.8933
95% CI : (0.859, 0.9217)
No Information Rate : 0.6799
P-Value [Acc > NIR] : < 2e-16

Kappa : 0.7471

Mcnemar's Test P-Value : 0.03276

Sensitivity : 0.7752
Specificity : 0.9489
Pos Pred Value : 0.8772
Neg Pred Value : 0.8997
Prevalence : 0.3201
Detection Rate : 0.2481
Detection Prevalence : 0.2829
Balanced Accuracy : 0.8620

'Positive' Class : 1

Applying Logistic regression shows that initially age,work experience highly significant and
after removing them and performing vif,we can see the values are in range.
Interpretation:
The results show us the distribution of deviance residuals for the individual components
used the dataset. We can summarize them as below:
1. Since maximum deviance is 2.29, It’s a good model. Lower is the deviance, better is
the model.
2. The variables Age,work experience,alary,distance,engineer and license are
significant.
3. Again, the difference between the residual and null deviance signifies that the model
is a good once since the difference is high.
4. For Age, work experience the VIF value is greater than 5, which means the model
has problem in estimating the coefficients.
5. The positive prediction value is 87.7% only and the sensitivity is 77.5%.The general
model has an accuracy rate of 89% which is okay for the model prediction using the
balanced data.

Using the balanced data,we got the AUC,ROC curve,KS and gini values.
AUC value:
> AUC
[1] 0.960039

ROC curve:

11
KS:
> train.ks
[1] 0.7982456

GINI value:
> train.gini
[1] 0.920078

4.2.KNN Model:

k-Nearest Neighbors

403 samples
8 predictor
2 classes: '0', '1'

Pre-processing: centered (8), scaled (8)

Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 363, 362, 362, 363, 363, 363, ...
Resampling results across tuning parameters:

k Accuracy Kappa
5 0.9206697 0.8153524
7 0.9198364 0.8118234
9 0.9116046 0.7898279
11 0.9099776 0.7853529
13 0.9091646 0.7815184
15 0.9074573 0.7770144
17 0.9049359 0.7688212
19 0.9033109 0.7645682
21 0.8916839 0.7347938
23 0.8958312 0.7460219

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 5.

> knn.CM_train
Confusion Matrix and Statistics

12
Reference
Prediction 0 1
0 264 13
1 10 116

Accuracy : 0.9429
95% CI : (0.9156, 0.9635)
No Information Rate : 0.6799
P-Value [Acc > NIR] : <2e-16

Kappa : 0.8681

Mcnemar's Test P-Value : 0.6767

Sensitivity : 0.8992
Specificity : 0.9635
Pos Pred Value : 0.9206
Neg Pred Value : 0.9531
Prevalence : 0.3201
Detection Rate : 0.2878
Detection Prevalence : 0.3127
Balanced Accuracy : 0.9314

'Positive' Class : 1

Interpretation of KNN Model:

 Trained tuned model for k-NN gives 5 as the optimal value

 KNN model has the accuracy rate of 94.29 % which is higher than logistic regression
model.
 The specificity is 96.35% and positive prediction value is 77.27%.

4.3. Applying Naive Bayes Model:

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace)

A-priori probabilities:
Y
0 1
0.6799007 0.3200993

Conditional probabilities:
Age
Y [,1] [,2]
0 26.40146 3.122551
1 34.73942 3.028973

Gender
Y Female Male
0 0.3211679 0.6788321
1 0.3488372 0.6511628

13
Engineer
Y 0 1
0 0.2153285 0.7846715
1 0.1395349 0.8604651

MBA
Y 0 1
0 0.6861314 0.3138686
1 0.7519380 0.2480620

Work.Exp
Y [,1] [,2]
0 5.014599 3.089591
1 14.338148 4.443303

Salary
Y [,1] [,2]
0 12.88102 4.556363
1 32.11274 11.267554

Distance
Y [,1] [,2]
0 10.17518 3.050342
1 15.35171 3.133974

license
Y 0 1
0 0.8686131 0.1313869
1 0.4263566 0.5736434

 For continuous variables Naïve Bayes takes the mean and standard deviation or
variability and treats it as cut off thresholds; say anything less than mean of
distributed predictor values is 0 and more than mean is 1.

Interpretation of Naïve Bayes model:

Confusion Matrix and Statistics

Reference
Prediction 0 1
0 266 16
1 8 113

Accuracy : 0.9404
95% CI : (0.9127, 0.9615)
No Information Rate : 0.6799
P-Value [Acc > NIR] : <2e-16

Kappa : 0.8609

Mcnemar's Test P-Value : 0.153

Sensitivity : 0.8760
Specificity : 0.9708
Pos Pred Value : 0.9339
Neg Pred Value : 0.9433
Prevalence : 0.3201

14
Detection Rate : 0.2804
Detection Prevalence : 0.3002
Balanced Accuracy : 0.9234

'Positive' Class : 1

 Accuracy of NB model is 96.97% which is higher than both KNN and LR model.
 The positive prediction value is 93.3% ,specificity is 97.08%.

4.4.Confusion matrix interpretation:

In the business point of view,decision is made on positive rates for predicting the car usage.

Hence,we will evaluate models based on accuracy on test data,sensitivity to compare model
performances.

Metrics Logistic Regression Naïve Bayes KNN

95.45
Accuracy 93.94% 97.73% %
Specificit 97.37
y 75.00% 98.25% %
Sensitivit 83.30
y 97.32% 94.40% %

Interpretation:

Accuracy is higher for NaiveBayes model when compared to Lr and KNN model.But sensitivity is higher
for LR model which proves that our models are not stable.

Boosting and Bagging models:

Bagging and boosting are ensemble models where bagging uses random forests to train the data as
multiple models using same algorithm and helps in creating the stronger model.

Applying bagging model:

Interpretation:

15
Our bagging models is using the baseline approach calling everything as true,hence it’s in extreme.

Applying Boosting:

For performing the boosting model,here we are using xgboost which will expect all the variables to
numeric.Hence,we will convert variables to numeric.

features_train = as.matrix(data_train[,1:8])
> label_train = as.matrix(data_train[,9])
> features_test = as.matrix(data_test[,1:8])
> XGBmodel = xgboost(
+ data = features_train,
+ label = label_train,
+ eta = .001,
+ max_depth = 5,
+ min_child_weight = 3,
+ nrounds = 10,
+ #nfold = 5,
+ objective = "binary:logistic", # for regression models
+ verbose = 0, # silent,
+ early_stopping_rounds = 10 # stop if no improvement for 10 consecutive
trees
+ )
> XGBpredTest = predict(XGBmodel, features_test)
> tabXGB = table(data_test$TransportUsage, XGBpredTest>0.5)

tabXGB

FALSE TRUE
0 111 3
1 3 15

Our xgboost model provides the accuracy rate of 95.45%.

#Accuracy: 95.45%
> sum(diag(tabXGB))/sum(tabXGB)
[1] 0.9545455
>
> #specificity : 83.33%
>
> 15/18
[1] 0.8333333
>
> #sensitivty :83.33% tp/p
>
> 15/18
[1] 0.8333333

Model comparison:

16
Using Smote train data ,we build Logistic regression,NB and Knn models and the accuracy using test data
shows NB model performed better.Bagging models shows complete accuracy and boosting models
shows 95.45% where our bagging has predicted 100% car users prediction.

Actionable insights and recommendations:

 The variables like Age, Work.Experience, Distance and License are the important predictors for
identifying transport preference.
 Age and Work.Exp are correlated hence we could use any one (prefer Work.Exp).
 Employees with work experience of 10 years and above are predicted to use car.
 Employees who must commute for distance greater than 12 are more likely to prefer car
 With license, we do see that 74% who commute through car have license and 89% who
commute through bus don’t have. But surprisingly 72% without license use 2-wheeler.
 Again, people with higher salaries (>20) are likely to use cars

Issues in SLAR - D Nunan
100% (1)
Issues in SLAR - D Nunan
18 pages
grade-5-personal-narratives-cww
No ratings yet
grade-5-personal-narratives-cww
3 pages
HowtoStudyKorean Unit 1 Lesson 6 PDF
100% (1)
HowtoStudyKorean Unit 1 Lesson 6 PDF
7 pages
Physician Assistant Resume Sample
100% (1)
Physician Assistant Resume Sample
5 pages
All Life Bank - AIML_ML_Project_low_code_notebook
No ratings yet
All Life Bank - AIML_ML_Project_low_code_notebook
78 pages
Part 1 - Game Data Analyst PDF
0% (1)
Part 1 - Game Data Analyst PDF
3 pages
Marketing Management Question Bank For IMT-Part1
No ratings yet
Marketing Management Question Bank For IMT-Part1
23 pages
Study Material Bsa
No ratings yet
Study Material Bsa
49 pages
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
100% (2)
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
47 pages
Part 3: R Questions: Please Write Proper R Code Using Functions For The Below Questions
No ratings yet
Part 3: R Questions: Please Write Proper R Code Using Functions For The Below Questions
1 page
Derivation Hydrodynamic Equation
No ratings yet
Derivation Hydrodynamic Equation
19 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Time Series Forecasting (Australian Gas) : Akalya KS
No ratings yet
Time Series Forecasting (Australian Gas) : Akalya KS
15 pages
Ch4_ Breakthrough Advertising Mastery
No ratings yet
Ch4_ Breakthrough Advertising Mastery
7 pages
Policy Claimform
No ratings yet
Policy Claimform
5 pages
Dcpni Wins $25M Grant From Dept. of Education: Ms. Ayris T. Scales
No ratings yet
Dcpni Wins $25M Grant From Dept. of Education: Ms. Ayris T. Scales
10 pages
Material Search Results - COMPARISON REPORT: General Information MAT - ID 123 MAT - ID 212 MAT - ID 265
No ratings yet
Material Search Results - COMPARISON REPORT: General Information MAT - ID 123 MAT - ID 212 MAT - ID 265
4 pages
Macbeth: Fate or Free Will?
No ratings yet
Macbeth: Fate or Free Will?
10 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
Part 1 - Game Data Analyst PDF
No ratings yet
Part 1 - Game Data Analyst PDF
3 pages
Machine Learning Solution
100% (1)
Machine Learning Solution
12 pages
Contemporary World PPT in Effects of Globalization To Politics
100% (2)
Contemporary World PPT in Effects of Globalization To Politics
14 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
Tps Manual
No ratings yet
Tps Manual
33 pages
Business Report: Advanced Statistics Module Project I
100% (1)
Business Report: Advanced Statistics Module Project I
5 pages
Clustering Project
100% (1)
Clustering Project
44 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
Semi-Detailed Lesson Plan in Mapeh: A. Review
100% (1)
Semi-Detailed Lesson Plan in Mapeh: A. Review
3 pages
Machine Learning For Robots: Course 1: Ros Deep Learning With Tensorflow 101
No ratings yet
Machine Learning For Robots: Course 1: Ros Deep Learning With Tensorflow 101
4 pages
Architecture - October 2019
No ratings yet
Architecture - October 2019
25 pages
Anti Siciliana
No ratings yet
Anti Siciliana
116 pages
Interview Schedule Bengali 6Th Phase - Advt. No. 1/2020 25TH JULY 2022 - 11:00 AM
No ratings yet
Interview Schedule Bengali 6Th Phase - Advt. No. 1/2020 25TH JULY 2022 - 11:00 AM
10 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
Predictive Modelling Project_Nandini
No ratings yet
Predictive Modelling Project_Nandini
31 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
No ratings yet
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
28 pages
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
100% (1)
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
24 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Distance Education Course Study Guide: Flexible Learning A.Y. 2020-2021
No ratings yet
Distance Education Course Study Guide: Flexible Learning A.Y. 2020-2021
12 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Chapter 2 Planning & Decision Making
No ratings yet
Chapter 2 Planning & Decision Making
56 pages
Lessonplansample 1
No ratings yet
Lessonplansample 1
5 pages
Data Mining Assignment: Sudhanva Saralaya
100% (1)
Data Mining Assignment: Sudhanva Saralaya
16 pages
Chaos Theory - The Essential For Military Applications PDF
No ratings yet
Chaos Theory - The Essential For Military Applications PDF
141 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
Work Conditions of A Criminologist
No ratings yet
Work Conditions of A Criminologist
2 pages
Polity PDF
No ratings yet
Polity PDF
596 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
You Must Answer This Question.: Remember
No ratings yet
You Must Answer This Question.: Remember
4 pages
PREDICTIVE MODELING
No ratings yet
PREDICTIVE MODELING
21 pages
478 HPV
No ratings yet
478 HPV
1 page
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Answer Book - Rose Wines
100% (1)
Answer Book - Rose Wines
11 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
Machine Learning (Project5) PDF
100% (2)
Machine Learning (Project5) PDF
13 pages
H 71 0200 0320 en - CU-E2x - Technical Data
No ratings yet
H 71 0200 0320 en - CU-E2x - Technical Data
2 pages
Assignment ML
100% (2)
Assignment ML
21 pages
SMDM Project Report - Shubham Bakshi - 07.05.2023
0% (1)
SMDM Project Report - Shubham Bakshi - 07.05.2023
23 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
100% (1)
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
12 pages
DDAL09-17 - In the Hand
No ratings yet
DDAL09-17 - In the Hand
55 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Akshaya SMDM Project Report
100% (1)
Akshaya SMDM Project Report
18 pages
LDA KNN Logistic
100% (1)
LDA KNN Logistic
29 pages
Untitled
No ratings yet
Untitled
4 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Problem 1 - (Download Data) : Importing Nessceary Libraries
No ratings yet
Problem 1 - (Download Data) : Importing Nessceary Libraries
16 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
An Introduction To Clustering and Different Methods of Clustering
No ratings yet
An Introduction To Clustering and Different Methods of Clustering
9 pages
Project Questions
No ratings yet
Project Questions
3 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
Problem 1
No ratings yet
Problem 1
12 pages
Report On Linear Regression Using R
No ratings yet
Report On Linear Regression Using R
15 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Cars Project PDF
No ratings yet
Cars Project PDF
9 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
End Term Quiz1 - Attempt Review
No ratings yet
End Term Quiz1 - Attempt Review
5 pages
The Cricket Winner Prediction With Applications of ML and Data Analytics
No ratings yet
The Cricket Winner Prediction With Applications of ML and Data Analytics
18 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Excel 2013/2016: Get Your Hands Dirty
From Everand
Excel 2013/2016: Get Your Hands Dirty
Sam Akrasi
No ratings yet

Predicting Mode of Transport (ML) : Akalya KS

Uploaded by

Predicting Mode of Transport (ML) : Akalya KS

Uploaded by

Predicting Mode of Transport (ML)

2 Exploratory Data Analysis

Exploratory Data Analysis:

The percentage distribution of transport variables as below:

Missing values and outliers:

3.Data preparation and SMOTE:

4.1.Logistic regression models:

Confusion Matrix and Statistics

Mcnemar's Test P-Value : 0.03276

Pre-processing: centered (8), scaled (8)

Mcnemar's Test P-Value : 0.6767

Interpretation of KNN Model:

 Trained tuned model for k-NN gives 5 as the optimal value

4.3. Applying Naive Bayes Model:

Naive Bayes Classifier for Discrete Predictors

Interpretation of Naïve Bayes model:

Mcnemar's Test P-Value : 0.153

4.4.Confusion matrix interpretation:

Metrics Logistic Regression Naïve Bayes KNN

Boosting and Bagging models:

Applying bagging model:

Our xgboost model provides the accuracy rate of 95.45%.

Actionable insights and recommendations:

You might also like