0% found this document useful (0 votes)

19 views23 pages

ML Pgms_24Mar2025

The document outlines a series of machine learning lab programs for BCA 6th semester students, focusing on Python and essential libraries such as NumPy, pandas, and scikit-learn. It includes instructions for setting up the environment, loading and visualizing datasets, handling missing data, and implementing various machine learning algorithms like k-NN, linear regression, decision trees, and K-Means clustering. Each section provides code examples and explanations for practical implementation.

Uploaded by

Bharath D.S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views23 pages

ML Pgms_24Mar2025

Uploaded by

Bharath D.S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Machine Learning Final Lab Programs

BCA 6th Sem

LIST OF PROGRAMS

1. Install and set up Python and essential libraries like NumPy and pandas

2. Introduce scikit-learn as a machine learning library.

3. Install and set up scikit-learn and other necessary tools.

4. Write a program to Load and explore the dataset of .CVS and excel files
using

pandas.
5. Write a program to Visualize the dataset to gain insights using Matplotlib or
Seaborn

by plotting scatter plots, bar charts.

6. Write a program to Handle missing data, encode categorical variables, and
perform

feature scaling.
7. Write a program to implement a k-Nearest Neighbours (k-NN) classifier

using scikit- learn and Train the classifier on the dataset and evaluate its

performance.

8. Write a program to implement a linear regression model for regression tasks

and

Train the model on a dataset with continuous target variables.

9. Write a program to implement a decision tree classifier using scikit-learn and

visualize the decision tree and understand its splits.

10. Write a program to Implement K-Means clustering and Visualize clusters.

1. install and set up Python and essential libraries like NumPy and Pandas.

Install Python: If you have not already installed Python, you can download it from the official
website:
To verify (terminal) python --version
Install pip: pip is a package manager for Python that allows you to easily install and manage
libraries. Most recent versions of Python come with pip pre-installed. You can verify if pip is
installed by running the following command in your terminal or command prompt:
pip --version
Install NumPy and pandas: Once you have Python and pip installed, you can use pip to install
NumPy and pandas by running the following commands in your terminal or command prompt: #In
terminal
pip install numpy pip install pandas
This will download and install NumPy and Pandas along with any dependencies they require.
Verify installation: After installing NumPy and pandas, you can verify that they were installed
correctly by running the following commands in Python's interactive mode or a Python script:

import numpy import pandas

print(numpy. version ) print(pandas. version )

These commands should print the versions of NumPy and pandas that were installed.
Output:
2. Introduce sci-kit-learn as a machine learning library.

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface in
Python. This library, which is largely written in Python, is built upon NumPy,
SciPy and Matplotlib. Installation

If you already installed NumPy and Scipy, the following are the two easiest ways to install scikit-
learn −

Using pip

The following command can be used to install sci-kit-learn via pip

pip install -U scikit-learn Features

Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is
focused on modeling the data. Some of the most popular groups of models provided by Sklearn are
as follows −

Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like
Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-
learn.

Unsupervised Learning algorithms − On the other hand, it also has all the popular
unsupervised learning algorithms from clustering, factor analysis, PCA (Principal Component
Analysis) to unsupervised neural networks.

Clustering − This model is used for grouping unlabeled data.

Cross Validation − It is used to check the accuracy of supervised models on unseen data.
3. Install and set up scikit-learn and other necessary tools.

scikit-learn, a powerful Python library for machine learning. Here are the steps to set it up:
Install Python: If you haven’t already installed Python, download and install the latest version of
Python 3 from the official Python website.
Install scikit-learn using pip: Open your terminal or command prompt and run the following
command:

pip install -U scikit-learn

To verify your installation, you can use the following commands: python -m pip show scikit-learn

# To see which version and where scikit-learn is installed python -m pip freeze

# To see all packages installed in the active virtual environment

import sklearn import numpy import pandas import matplotlib

print(sklearn. version )

print(numpy. version )

print(pandas. version )

print(matplotlib. version )

These commands should print the versions of scikit-learn and other libraries that were installed.
Output:
4. Write a program to Load and explore the dataset of .CSV and excel files using

pandas.

import pandas as pd

def explore_dataset(file_path):

# Check if the file is a CSV or Excel file if file_path.endswith('.csv'):

# Load CSV file into a pandas DataFrame df = pd.read_csv(file_path)

elif file_path.endswith('.xlsx'):

# Load Excel file into a pandas DataFrame df = pd.read_excel(file_path)

else:

print("Unsupported file format. Please provide a CSV or Excel file.") return

# Display basic information about the DataFrame print("Dataset information:")

print(df.info())

# Display the first few rows of the DataFrame print("\nFirst few rows of the dataset:")

print(df.head())
# Display summary statistics for numerical columns print("\nSummary statistics:")

print(df.describe())

# Display unique values for categorical columns print("\nUnique values for categorical

columns:")

for column in df.select_dtypes(include='object').columns: print(f"{column}:

{df[column].unique()}")

# Example usage file_path = 'IRIS.csv'

# Change this to the path of your CSV or Excel file explore_dataset(file_path)

Output:
5. Write a program to Visualize the dataset to gain insights using Matplotlib or
Seaborn by plotting scatter plots, and bar charts.

Scatter Plot

import matplotlib.pyplot as plt

Products = ['Pen', 'Pencil', 'Eraser', 'Sharpener', 'Ruler']
Prices = [20, 7, 2, 4, 10]
plt.scatter(Products, Prices)
plt.xlabel('Products')
plt.ylabel('Prices')
plt.title('Sales Data Scatter Plot')
plt.show()

Output:-

Bar Chart

import matplotlib.pyplot as plt

Subjects = ['ML', 'DA', 'DM', 'AI']
Marks = [55, 40, 36, 45]
plt.bar(Subjects, Marks)
plt.xlabel('Subjects')
plt.ylabel('Marks')
plt.title('Semester Performance Bar Chart')
plt.show()
Output:-
6. Write a program to Handle missing data, encode categorical variables, and perform
feature scaling.

To Handle Missing Data

i) Removal of Rows with Missing Data

# import modules
import pandas as pd
from numpy import nan

# Load dataset
df = pd.read_csv(r"C:\Users\PRIYA VINESH\Desktop\train.csv", header=None)

# count the number of missing (NaN) values in each column

print('count of the number of missing (NaN) values in each column')
print(df.isnull().sum())

# summarize the shape of the raw data

print('Summarised Shape of the raw data')
print(df.shape)

# replace '0' values with 'nan'

df[[0,1,2,3,4,5,6,7,8,9,10,11]] = df[[0,1,2,3,4,5,6,7,8,9,10,11]].replace(0, nan)
print('Data set with 0 replaced with NaN')
print(df)

# drop rows with missing values

df.dropna(inplace=True)

# summarize the shape of the data with missing rows removed

print('Summarised Shape of the raw data with missing rows removed')
print(df.shape)
Output

ii) Impute Missing values with the Mean

import pandas as pd
import numpy as np
# create a sample dataframe with missing values
print('Sample Dataframe with Missing Values')
df = pd.DataFrame({'A': [1, 2, np.nan, 4, 5],
'B': [6, np.nan, 8, 9, 10],
'C': [11, 12, 13, np.nan, 15]})
print(df)
# impute missing values with the mean
print('Dataframe after imputing missing values with the Mean')
df.fillna(df.mean(), inplace=True)
print(df)
Output
To Encode Categorical Variables

Encoding categorical variables is a common preprocessing step in machine learning. One

simple way to encode categorical variables is to use one-hot encoding. Here is a simple Python
program using the pandas library for one-hot encoding:

i) One-hot Encoding

import pandas as pd

# Sample data with categorical variable

data = {'Category': ['A', 'B', 'A', 'C', 'B', 'C']}
df = pd.DataFrame(data)

# One-hot encoding
df_encoded = pd.get_dummies(df, columns=['Category'], prefix='Category')

# Display the encoded DataFrame

print("Original DataFrame:")
print(df)
print("\nEncoded DataFrame:")
print(df_encoded)

Note:- This program creates a DataFrame with a categorical variable ('Category') and then uses the
pd.get_dummies function to perform one-hot

Output

ii) Label Encoding

One alternative to one-hot encoding is Label Encoding. In Label Encoding, each unique category
is assigned an integer label. Here is a simple Python program using the LabelEncoder from the
scikit-learn library:
from sklearn.preprocessing import LabelEncoder

def encode_categorical_variable(data, column_name):

le = LabelEncoder()
data[column_name] = le.fit_transform(data[column_name])
return data

# Example usage:
# Assume you have a DataFrame 'df' with a categorical column 'category'
# Replace this with your actual DataFrame and column names

# Sample DataFrame
import pandas as pd

data = {'ID': [1, 2, 3, 4, 5],

'Category': ['A', 'B', 'A', 'C', 'B']}
df = pd.DataFrame(data)

# Encode the 'Category' column using Label Encoding

encoded_df = encode_categorical_variable(df, 'Category')

# Display the result

print(encoded_df)

Note:- This program uses the LabelEncoder to transform the categorical values in the
specified column ('Category' in this case) into numerical labels. The transformed
DataFrame is then printed.

It is possible to replace the sample DataFrame and column names with our actual data.

Output
To perform Feature Scaling

Feature scaling is important in machine learning to standardize or normalize the range of

independent variables or features of the data. One common technique is Min-Max Scaling, where
the values are scaled to a specific range, typically between 0 and 1.

Here is a simple Python program using scikit-learn's MinMaxScaler for feature scaling:

i) Using MinMaxScaler

from sklearn.preprocessing import MinMaxScaler

import pandas as pd

def perform_feature_scaling(data, columns_to_scale):

scaler = MinMaxScaler()
data[columns_to_scale] = scaler.fit_transform(data[columns_to_scale])
return data

# Example usage:
# Assume you have a DataFrame 'df' with columns to be scaled
# Replace this with your actual DataFrame and column names

# Sample DataFrame
data = {'Feature1': [10, 20, 30, 40, 50],
'Feature2': [5, 15, 25, 35, 45]}
df = pd.DataFrame(data)

# Columns to scale
columns_to_scale = ['Feature1', 'Feature2']

# Perform feature scaling

scaled_df = perform_feature_scaling(df, columns_to_scale)

# Display the result

print(scaled_df)

This program uses the MinMaxScaler to scale the specified columns ('Feature1' and
'Feature2' in this case) to the range [0, 1]. It is possible to replace the sample DataFrame
and column names with our actual data.

Note: Feature scaling is generally applied to numerical features. If the DataFrame

contains non-numerical columns, we need to handle them separately or exclude them
from the scaling process.
Output
7. Write a program to implement a k-Nearest Neighbours (k-NN) classifier using
scikit-learn and Train the classifier on the dataset and evaluate its performance.

import numpy as np import pandas as pd

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split from sklearn.neighbors import

KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load Iris dataset iris = load_iris()

X = iris.data y = iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the k-NN classifier k = 3 # Number of neighbors

knn_classifier = KNeighborsClassifier(n_neighbors=k)

# Train the classifier knn_classifier.fit(X_train, y_train)

# Make predictions on the testing set y_pred = knn_classifier.predict(X_test) # Evaluate

the classifier's performance
accuracy = accuracy_score(y_test, y_pred) print("Accuracy:", accuracy)

# Display classification report print("Classification Report:")

print(classification_report(y_test, y_pred, target_names=iris.target_names))

Output:
8. Write a program to implement a linear regression model for regression tasks and
Train the model on a dataset with continuous target variables.
9. Write a program to implement a decision tree classifier using scikit-learn and

visualize the decision tree and understand its splits.

import numpy as np import pandas as pd

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree import matplotlib.pyplot as plt

# Load Iris dataset iris = load_iris()

X = iris.data y = iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Decision Tree classifier

decision_tree = DecisionTreeClassifier()

# Train the classifier

decision_tree.fit(X_train, y_train)

# Visualize the decision tree plt.figure(figsize=(12, 8))

plot_tree(decision_tree, feature_names=iris.feature_names, class_names=iris.target_names,

filled=True)

plt.show()
Output:
10. Write a program to Implement K-Means clustering and Visualize clusters.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs from sklearn.cluster import KMeans

# Generate sample data

X, y = make_blobs(n_samples=500, centers=4, cluster_std=0.8, random_state=42)

# Create a K-Means clusterer with 4 clusters kmeans = KMeans(n_clusters=4,

random_state=42)

# Fit the data kmeans.fit(X)

# Get cluster labels labels = kmeans.labels_

# Plot the data with cluster labels

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100, c='red',
label='Centroids')
plt.title('K-Means Clustering')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()
Output:

End

Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Ucs551 GRP Project
No ratings yet
Ucs551 GRP Project
34 pages
ML With Python Lab (MCA)
No ratings yet
ML With Python Lab (MCA)
36 pages
Decision Analysis Using Microsoft Excel PDF
No ratings yet
Decision Analysis Using Microsoft Excel PDF
400 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
DS LAB MANUAL (1)
No ratings yet
DS LAB MANUAL (1)
113 pages
Mastering pandas 1st Edition Femi Anthony 2024 Scribd Download
100% (1)
Mastering pandas 1st Edition Femi Anthony 2024 Scribd Download
50 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
unit 4
No ratings yet
unit 4
105 pages
MLC Practical
No ratings yet
MLC Practical
51 pages
Statistics With Python (Matplotlib)
No ratings yet
Statistics With Python (Matplotlib)
22 pages
ML - Lab - Programs - J
No ratings yet
ML - Lab - Programs - J
18 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
ML lab_abbs
No ratings yet
ML lab_abbs
23 pages
Unit 5 PythonPackages (Numpy,Pandas,Tkinter)
No ratings yet
Unit 5 PythonPackages (Numpy,Pandas,Tkinter)
68 pages
Data science lab
No ratings yet
Data science lab
61 pages
ANL252 SU5 Jul2022
No ratings yet
ANL252 SU5 Jul2022
58 pages
Ml Lab Manual(Vim)
No ratings yet
Ml Lab Manual(Vim)
13 pages
data-mining-lab-manual-CSE-VII-Sem
No ratings yet
data-mining-lab-manual-CSE-VII-Sem
63 pages
Unit-2 Feature Selection
No ratings yet
Unit-2 Feature Selection
92 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
MODULE-6
No ratings yet
MODULE-6
48 pages
Data Preprocessing and Data Analysis using Python
No ratings yet
Data Preprocessing and Data Analysis using Python
32 pages
Machine Learning Laboratory: Manual
No ratings yet
Machine Learning Laboratory: Manual
52 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
Regression Scikit Learn
No ratings yet
Regression Scikit Learn
33 pages
ML[Lab Programs]
No ratings yet
ML[Lab Programs]
28 pages
ml file syllabus
No ratings yet
ml file syllabus
43 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
ML Lab Manual
No ratings yet
ML Lab Manual
20 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
mlviva
No ratings yet
mlviva
14 pages
ML LabManual (1)
No ratings yet
ML LabManual (1)
16 pages
final dev record
No ratings yet
final dev record
49 pages
ML_LAB_MANUAL
No ratings yet
ML_LAB_MANUAL
12 pages
Solutions Corporate Finance
100% (1)
Solutions Corporate Finance
28 pages
Ass-1 Prac
No ratings yet
Ass-1 Prac
23 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
72b85f60-8523-423f-9efc-ff56aa21f3f3
No ratings yet
72b85f60-8523-423f-9efc-ff56aa21f3f3
29 pages
ML MANUAL
No ratings yet
ML MANUAL
21 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Natural Language Processing and ML Based Student Mental Health Analysis Using Non Clinical Texts PDF
No ratings yet
Natural Language Processing and ML Based Student Mental Health Analysis Using Non Clinical Texts PDF
53 pages
Exp-1
No ratings yet
Exp-1
22 pages
BDA File
No ratings yet
BDA File
26 pages
Kabir Data Preprocessing Python
No ratings yet
Kabir Data Preprocessing Python
14 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
Ba Ca
No ratings yet
Ba Ca
10 pages
Dav Lab
No ratings yet
Dav Lab
8 pages
MACHINE LEARNING LAB PROGRAMS
No ratings yet
MACHINE LEARNING LAB PROGRAMS
6 pages
Exp No. 1-3 (MLC)
No ratings yet
Exp No. 1-3 (MLC)
12 pages
ML Lab
No ratings yet
ML Lab
4 pages
[2025] Product Analytics with Tableau
No ratings yet
[2025] Product Analytics with Tableau
108 pages
Unit 5
No ratings yet
Unit 5
27 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Exp1-ref-doc-installation
No ratings yet
Exp1-ref-doc-installation
6 pages
ML-Lab Manual - NEP - DSS
No ratings yet
ML-Lab Manual - NEP - DSS
23 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
82 pages
23CS302 - dslab - experiment 1
No ratings yet
23CS302 - dslab - experiment 1
5 pages
Instant Access to Just Enough R An Interactive Approach to Machine Learning and Analytics 1st Edition Richard J. Roiger ebook Full Chapters
100% (4)
Instant Access to Just Enough R An Interactive Approach to Machine Learning and Analytics 1st Edition Richard J. Roiger ebook Full Chapters
55 pages
Session 17-Decision Tree
No ratings yet
Session 17-Decision Tree
16 pages
Mini Project Report RASHMITHA
No ratings yet
Mini Project Report RASHMITHA
38 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
6_DM
No ratings yet
6_DM
2 pages
Skin Cancer Detection
No ratings yet
Skin Cancer Detection
16 pages
ML PPT Ca4
No ratings yet
ML PPT Ca4
8 pages
Artemis Ai Driven Robotic Triage Labeling and Emergency 140ilkj8pd
No ratings yet
Artemis Ai Driven Robotic Triage Labeling and Emergency 140ilkj8pd
7 pages
20011F0008 Samba PRC3
No ratings yet
20011F0008 Samba PRC3
21 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
10 pages
Submission Group 12 Exercise 5
No ratings yet
Submission Group 12 Exercise 5
42 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
Asia Green Development Bank's Strategic Decision Making
No ratings yet
Asia Green Development Bank's Strategic Decision Making
6 pages
Nbtree: A Naive Bayes/Decision-Tree Hybrid: Darin Morrison
No ratings yet
Nbtree: A Naive Bayes/Decision-Tree Hybrid: Darin Morrison
27 pages
Synopsis
No ratings yet
Synopsis
9 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
A Literature Review of Igbt Fault Diagnostic and
100% (1)
A Literature Review of Igbt Fault Diagnostic and
5 pages
2022 Optimization of Random Forest Through The Use of MVO, GWO and MFO in Evaluating The Stability of Underground Entry-Type Excavations
No ratings yet
2022 Optimization of Random Forest Through The Use of MVO, GWO and MFO in Evaluating The Stability of Underground Entry-Type Excavations
22 pages
RUS Boost Tree Ensemble Classifiers For OD
No ratings yet
RUS Boost Tree Ensemble Classifiers For OD
7 pages
DWDM Externallab2022for Student
No ratings yet
DWDM Externallab2022for Student
3 pages
Capstone Proect Notes 2
100% (2)
Capstone Proect Notes 2
16 pages
Multi-Stage Decision Analysis: Answer
No ratings yet
Multi-Stage Decision Analysis: Answer
2 pages
Ensemble Techniques and Random Forest: - Linear Algebra. - Basics of Machine Learning
No ratings yet
Ensemble Techniques and Random Forest: - Linear Algebra. - Basics of Machine Learning
8 pages
(IJCST-V11I2P15) :jas Simran Kaur, Rupinder Kaur, Balpreet Kaur
No ratings yet
(IJCST-V11I2P15) :jas Simran Kaur, Rupinder Kaur, Balpreet Kaur
9 pages
A Machine Learning Approach To Predict The Result of League of Legends
No ratings yet
A Machine Learning Approach To Predict The Result of League of Legends
8 pages
KCET 2022 Preparation Tips
No ratings yet
KCET 2022 Preparation Tips
11 pages
HCA Question Bank
No ratings yet
HCA Question Bank
4 pages
Lecture Plan For Jee - Physics (2021)
No ratings yet
Lecture Plan For Jee - Physics (2021)
4 pages
MBA O.R Assignment1
No ratings yet
MBA O.R Assignment1
2 pages
Adding Fractions With Like Denominators Sheet 1: Name Date
No ratings yet
Adding Fractions With Like Denominators Sheet 1: Name Date
2 pages
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet

ML Pgms_24Mar2025

Uploaded by

ML Pgms_24Mar2025

Uploaded by

Machine Learning Final Lab Programs

BCA 6th Sem

2. Introduce scikit-learn as a machine learning library.

by plotting scatter plots, bar charts.

8. Write a program to implement a linear regression model for regression tasks

Train the model on a dataset with continuous target variables.

visualize the decision tree and understand its splits.

10. Write a program to Implement K-Means clustering and Visualize clusters.

import numpy import pandas

The following command can be used to install sci-kit-learn via pip

pip install -U scikit-learn Features

Clustering − This model is used for grouping unlabeled data.

pip install -U scikit-learn

# To see all packages installed in the active virtual environment

import sklearn import numpy import pandas import matplotlib

# Check if the file is a CSV or Excel file if file_path.endswith('.csv'):

# Load CSV file into a pandas DataFrame df = pd.read_csv(file_path)

# Load Excel file into a pandas DataFrame df = pd.read_excel(file_path)

print("Unsupported file format. Please provide a CSV or Excel file.") return

# Display basic information about the DataFrame print("Dataset information:")

for column in df.select_dtypes(include='object').columns: print(f"{column}:

# Example usage file_path = 'IRIS.csv'

# Change this to the path of your CSV or Excel file explore_dataset(file_path)

import matplotlib.pyplot as plt

import matplotlib.pyplot as plt

To Handle Missing Data

i) Removal of Rows with Missing Data

# count the number of missing (NaN) values in each column

# summarize the shape of the raw data

# replace '0' values with 'nan'

# drop rows with missing values

# summarize the shape of the data with missing rows removed

ii) Impute Missing values with the Mean

Encoding categorical variables is a common preprocessing step in machine learning. One

# Sample data with categorical variable

# Display the encoded DataFrame

ii) Label Encoding

def encode_categorical_variable(data, column_name):

data = {'ID': [1, 2, 3, 4, 5],

# Encode the 'Category' column using Label Encoding

# Display the result

Feature scaling is important in machine learning to standardize or normalize the range of

from sklearn.preprocessing import MinMaxScaler

def perform_feature_scaling(data, columns_to_scale):

# Perform feature scaling

# Display the result

Note: Feature scaling is generally applied to numerical features. If the DataFrame

import numpy as np import pandas as pd

from sklearn.model_selection import train_test_split from sklearn.neighbors import

# Load Iris dataset iris = load_iris()

# Split the dataset into training and testing sets

# Initialize the k-NN classifier k = 3 # Number of neighbors

# Train the classifier knn_classifier.fit(X_train, y_train)

# Make predictions on the testing set y_pred = knn_classifier.predict(X_test) # Evaluate

# Display classification report print("Classification Report:")

visualize the decision tree and understand its splits.

import numpy as np import pandas as pd

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree import matplotlib.pyplot as plt

# Load Iris dataset iris = load_iris()

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Decision Tree classifier

# Train the classifier

# Visualize the decision tree plt.figure(figsize=(12, 8))

plot_tree(decision_tree, feature_names=iris.feature_names, class_names=iris.target_names,

# Generate sample data

# Create a K-Means clusterer with 4 clusters kmeans = KMeans(n_clusters=4,

# Fit the data kmeans.fit(X)

# Get cluster labels labels = kmeans.labels_

# Plot the data with cluster labels

____________ End ____________

You might also like

End