0% found this document useful (0 votes)
19 views23 pages

ML Pgms_24Mar2025

The document outlines a series of machine learning lab programs for BCA 6th semester students, focusing on Python and essential libraries such as NumPy, pandas, and scikit-learn. It includes instructions for setting up the environment, loading and visualizing datasets, handling missing data, and implementing various machine learning algorithms like k-NN, linear regression, decision trees, and K-Means clustering. Each section provides code examples and explanations for practical implementation.

Uploaded by

Bharath D.S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views23 pages

ML Pgms_24Mar2025

The document outlines a series of machine learning lab programs for BCA 6th semester students, focusing on Python and essential libraries such as NumPy, pandas, and scikit-learn. It includes instructions for setting up the environment, loading and visualizing datasets, handling missing data, and implementing various machine learning algorithms like k-NN, linear regression, decision trees, and K-Means clustering. Each section provides code examples and explanations for practical implementation.

Uploaded by

Bharath D.S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Machine Learning Final Lab Programs

BCA 6th Sem

LIST OF PROGRAMS

1. Install and set up Python and essential libraries like NumPy and pandas

2. Introduce scikit-learn as a machine learning library.


3. Install and set up scikit-learn and other necessary tools.

4. Write a program to Load and explore the dataset of .CVS and excel files
using

pandas.
5. Write a program to Visualize the dataset to gain insights using Matplotlib or
Seaborn

by plotting scatter plots, bar charts.


6. Write a program to Handle missing data, encode categorical variables, and
perform

feature scaling.
7. Write a program to implement a k-Nearest Neighbours (k-NN) classifier

using scikit- learn and Train the classifier on the dataset and evaluate its

performance.

8. Write a program to implement a linear regression model for regression tasks


and

Train the model on a dataset with continuous target variables.


9. Write a program to implement a decision tree classifier using scikit-learn and

visualize the decision tree and understand its splits.

10. Write a program to Implement K-Means clustering and Visualize clusters.


1. install and set up Python and essential libraries like NumPy and Pandas.

Install Python: If you have not already installed Python, you can download it from the official
website:
To verify (terminal) python --version
Install pip: pip is a package manager for Python that allows you to easily install and manage
libraries. Most recent versions of Python come with pip pre-installed. You can verify if pip is
installed by running the following command in your terminal or command prompt:
pip --version
Install NumPy and pandas: Once you have Python and pip installed, you can use pip to install
NumPy and pandas by running the following commands in your terminal or command prompt: #In
terminal
pip install numpy pip install pandas
This will download and install NumPy and Pandas along with any dependencies they require.
Verify installation: After installing NumPy and pandas, you can verify that they were installed
correctly by running the following commands in Python's interactive mode or a Python script:

import numpy import pandas


print(numpy. version ) print(pandas. version )

These commands should print the versions of NumPy and pandas that were installed.
Output:
2. Introduce sci-kit-learn as a machine learning library.

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface in
Python. This library, which is largely written in Python, is built upon NumPy,
SciPy and Matplotlib. Installation

If you already installed NumPy and Scipy, the following are the two easiest ways to install scikit-
learn −

Using pip

The following command can be used to install sci-kit-learn via pip

pip install -U scikit-learn Features

Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is
focused on modeling the data. Some of the most popular groups of models provided by Sklearn are
as follows −

Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like
Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-
learn.

Unsupervised Learning algorithms − On the other hand, it also has all the popular
unsupervised learning algorithms from clustering, factor analysis, PCA (Principal Component
Analysis) to unsupervised neural networks.

Clustering − This model is used for grouping unlabeled data.

Cross Validation − It is used to check the accuracy of supervised models on unseen data.
3. Install and set up scikit-learn and other necessary tools.

scikit-learn, a powerful Python library for machine learning. Here are the steps to set it up:
Install Python: If you haven’t already installed Python, download and install the latest version of
Python 3 from the official Python website.
Install scikit-learn using pip: Open your terminal or command prompt and run the following
command:

pip install -U scikit-learn

To verify your installation, you can use the following commands: python -m pip show scikit-learn

# To see which version and where scikit-learn is installed python -m pip freeze

# To see all packages installed in the active virtual environment

import sklearn import numpy import pandas import matplotlib

print(sklearn. version )

print(numpy. version )

print(pandas. version )

print(matplotlib. version )

These commands should print the versions of scikit-learn and other libraries that were installed.
Output:
4. Write a program to Load and explore the dataset of .CSV and excel files using

pandas.

import pandas as pd

def explore_dataset(file_path):

# Check if the file is a CSV or Excel file if file_path.endswith('.csv'):

# Load CSV file into a pandas DataFrame df = pd.read_csv(file_path)

elif file_path.endswith('.xlsx'):

# Load Excel file into a pandas DataFrame df = pd.read_excel(file_path)

else:

print("Unsupported file format. Please provide a CSV or Excel file.") return

# Display basic information about the DataFrame print("Dataset information:")

print(df.info())

# Display the first few rows of the DataFrame print("\nFirst few rows of the dataset:")

print(df.head())
# Display summary statistics for numerical columns print("\nSummary statistics:")

print(df.describe())

# Display unique values for categorical columns print("\nUnique values for categorical

columns:")

for column in df.select_dtypes(include='object').columns: print(f"{column}:

{df[column].unique()}")

# Example usage file_path = 'IRIS.csv'

# Change this to the path of your CSV or Excel file explore_dataset(file_path)


Output:
5. Write a program to Visualize the dataset to gain insights using Matplotlib or
Seaborn by plotting scatter plots, and bar charts.

Scatter Plot

import matplotlib.pyplot as plt


Products = ['Pen', 'Pencil', 'Eraser', 'Sharpener', 'Ruler']
Prices = [20, 7, 2, 4, 10]
plt.scatter(Products, Prices)
plt.xlabel('Products')
plt.ylabel('Prices')
plt.title('Sales Data Scatter Plot')
plt.show()

Output:-

Bar Chart

import matplotlib.pyplot as plt


Subjects = ['ML', 'DA', 'DM', 'AI']
Marks = [55, 40, 36, 45]
plt.bar(Subjects, Marks)
plt.xlabel('Subjects')
plt.ylabel('Marks')
plt.title('Semester Performance Bar Chart')
plt.show()
Output:-
6. Write a program to Handle missing data, encode categorical variables, and perform
feature scaling.

To Handle Missing Data

i) Removal of Rows with Missing Data


# import modules
import pandas as pd
from numpy import nan

# Load dataset
df = pd.read_csv(r"C:\Users\PRIYA VINESH\Desktop\train.csv", header=None)

# count the number of missing (NaN) values in each column


print('count of the number of missing (NaN) values in each column')
print(df.isnull().sum())

# summarize the shape of the raw data


print('Summarised Shape of the raw data')
print(df.shape)

# replace '0' values with 'nan'


df[[0,1,2,3,4,5,6,7,8,9,10,11]] = df[[0,1,2,3,4,5,6,7,8,9,10,11]].replace(0, nan)
print('Data set with 0 replaced with NaN')
print(df)

# drop rows with missing values


df.dropna(inplace=True)

# summarize the shape of the data with missing rows removed


print('Summarised Shape of the raw data with missing rows removed')
print(df.shape)
Output

ii) Impute Missing values with the Mean

import pandas as pd
import numpy as np
# create a sample dataframe with missing values
print('Sample Dataframe with Missing Values')
df = pd.DataFrame({'A': [1, 2, np.nan, 4, 5],
'B': [6, np.nan, 8, 9, 10],
'C': [11, 12, 13, np.nan, 15]})
print(df)
# impute missing values with the mean
print('Dataframe after imputing missing values with the Mean')
df.fillna(df.mean(), inplace=True)
print(df)
Output
To Encode Categorical Variables

Encoding categorical variables is a common preprocessing step in machine learning. One


simple way to encode categorical variables is to use one-hot encoding. Here is a simple Python
program using the pandas library for one-hot encoding:

i) One-hot Encoding

import pandas as pd

# Sample data with categorical variable


data = {'Category': ['A', 'B', 'A', 'C', 'B', 'C']}
df = pd.DataFrame(data)

# One-hot encoding
df_encoded = pd.get_dummies(df, columns=['Category'], prefix='Category')

# Display the encoded DataFrame


print("Original DataFrame:")
print(df)
print("\nEncoded DataFrame:")
print(df_encoded)

Note:- This program creates a DataFrame with a categorical variable ('Category') and then uses the
pd.get_dummies function to perform one-hot

Output

ii) Label Encoding

One alternative to one-hot encoding is Label Encoding. In Label Encoding, each unique category
is assigned an integer label. Here is a simple Python program using the LabelEncoder from the
scikit-learn library:
from sklearn.preprocessing import LabelEncoder

def encode_categorical_variable(data, column_name):


le = LabelEncoder()
data[column_name] = le.fit_transform(data[column_name])
return data

# Example usage:
# Assume you have a DataFrame 'df' with a categorical column 'category'
# Replace this with your actual DataFrame and column names

# Sample DataFrame
import pandas as pd

data = {'ID': [1, 2, 3, 4, 5],


'Category': ['A', 'B', 'A', 'C', 'B']}
df = pd.DataFrame(data)

# Encode the 'Category' column using Label Encoding


encoded_df = encode_categorical_variable(df, 'Category')

# Display the result


print(encoded_df)

Note:- This program uses the LabelEncoder to transform the categorical values in the
specified column ('Category' in this case) into numerical labels. The transformed
DataFrame is then printed.

It is possible to replace the sample DataFrame and column names with our actual data.

Output
To perform Feature Scaling

Feature scaling is important in machine learning to standardize or normalize the range of


independent variables or features of the data. One common technique is Min-Max Scaling, where
the values are scaled to a specific range, typically between 0 and 1.

Here is a simple Python program using scikit-learn's MinMaxScaler for feature scaling:

i) Using MinMaxScaler

from sklearn.preprocessing import MinMaxScaler


import pandas as pd

def perform_feature_scaling(data, columns_to_scale):


scaler = MinMaxScaler()
data[columns_to_scale] = scaler.fit_transform(data[columns_to_scale])
return data

# Example usage:
# Assume you have a DataFrame 'df' with columns to be scaled
# Replace this with your actual DataFrame and column names

# Sample DataFrame
data = {'Feature1': [10, 20, 30, 40, 50],
'Feature2': [5, 15, 25, 35, 45]}
df = pd.DataFrame(data)

# Columns to scale
columns_to_scale = ['Feature1', 'Feature2']

# Perform feature scaling


scaled_df = perform_feature_scaling(df, columns_to_scale)

# Display the result


print(scaled_df)

This program uses the MinMaxScaler to scale the specified columns ('Feature1' and
'Feature2' in this case) to the range [0, 1]. It is possible to replace the sample DataFrame
and column names with our actual data.

Note: Feature scaling is generally applied to numerical features. If the DataFrame


contains non-numerical columns, we need to handle them separately or exclude them
from the scaling process.
Output
7. Write a program to implement a k-Nearest Neighbours (k-NN) classifier using
scikit-learn and Train the classifier on the dataset and evaluate its performance.

import numpy as np import pandas as pd


from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split from sklearn.neighbors import


KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load Iris dataset iris = load_iris()


X = iris.data y = iris.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the k-NN classifier k = 3 # Number of neighbors


knn_classifier = KNeighborsClassifier(n_neighbors=k)

# Train the classifier knn_classifier.fit(X_train, y_train)

# Make predictions on the testing set y_pred = knn_classifier.predict(X_test) # Evaluate


the classifier's performance
accuracy = accuracy_score(y_test, y_pred) print("Accuracy:", accuracy)

# Display classification report print("Classification Report:")


print(classification_report(y_test, y_pred, target_names=iris.target_names))

Output:
8. Write a program to implement a linear regression model for regression tasks and
Train the model on a dataset with continuous target variables.
9. Write a program to implement a decision tree classifier using scikit-learn and

visualize the decision tree and understand its splits.

import numpy as np import pandas as pd

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree import matplotlib.pyplot as plt

# Load Iris dataset iris = load_iris()

X = iris.data y = iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Decision Tree classifier

decision_tree = DecisionTreeClassifier()

# Train the classifier

decision_tree.fit(X_train, y_train)

# Visualize the decision tree plt.figure(figsize=(12, 8))

plot_tree(decision_tree, feature_names=iris.feature_names, class_names=iris.target_names,


filled=True)

plt.show()
Output:
10. Write a program to Implement K-Means clustering and Visualize clusters.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs from sklearn.cluster import KMeans

# Generate sample data


X, y = make_blobs(n_samples=500, centers=4, cluster_std=0.8, random_state=42)

# Create a K-Means clusterer with 4 clusters kmeans = KMeans(n_clusters=4,


random_state=42)

# Fit the data kmeans.fit(X)

# Get cluster labels labels = kmeans.labels_

# Plot the data with cluster labels

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100, c='red',
label='Centroids')
plt.title('K-Means Clustering')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()
Output:

____________ End ____________

You might also like