ML Pgms_24Mar2025
ML Pgms_24Mar2025
LIST OF PROGRAMS
1. Install and set up Python and essential libraries like NumPy and pandas
4. Write a program to Load and explore the dataset of .CVS and excel files
using
pandas.
5. Write a program to Visualize the dataset to gain insights using Matplotlib or
Seaborn
feature scaling.
7. Write a program to implement a k-Nearest Neighbours (k-NN) classifier
using scikit- learn and Train the classifier on the dataset and evaluate its
performance.
Install Python: If you have not already installed Python, you can download it from the official
website:
To verify (terminal) python --version
Install pip: pip is a package manager for Python that allows you to easily install and manage
libraries. Most recent versions of Python come with pip pre-installed. You can verify if pip is
installed by running the following command in your terminal or command prompt:
pip --version
Install NumPy and pandas: Once you have Python and pip installed, you can use pip to install
NumPy and pandas by running the following commands in your terminal or command prompt: #In
terminal
pip install numpy pip install pandas
This will download and install NumPy and Pandas along with any dependencies they require.
Verify installation: After installing NumPy and pandas, you can verify that they were installed
correctly by running the following commands in Python's interactive mode or a Python script:
These commands should print the versions of NumPy and pandas that were installed.
Output:
2. Introduce sci-kit-learn as a machine learning library.
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface in
Python. This library, which is largely written in Python, is built upon NumPy,
SciPy and Matplotlib. Installation
If you already installed NumPy and Scipy, the following are the two easiest ways to install scikit-
learn −
Using pip
Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is
focused on modeling the data. Some of the most popular groups of models provided by Sklearn are
as follows −
Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like
Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-
learn.
Unsupervised Learning algorithms − On the other hand, it also has all the popular
unsupervised learning algorithms from clustering, factor analysis, PCA (Principal Component
Analysis) to unsupervised neural networks.
Cross Validation − It is used to check the accuracy of supervised models on unseen data.
3. Install and set up scikit-learn and other necessary tools.
scikit-learn, a powerful Python library for machine learning. Here are the steps to set it up:
Install Python: If you haven’t already installed Python, download and install the latest version of
Python 3 from the official Python website.
Install scikit-learn using pip: Open your terminal or command prompt and run the following
command:
To verify your installation, you can use the following commands: python -m pip show scikit-learn
# To see which version and where scikit-learn is installed python -m pip freeze
print(sklearn. version )
print(numpy. version )
print(pandas. version )
print(matplotlib. version )
These commands should print the versions of scikit-learn and other libraries that were installed.
Output:
4. Write a program to Load and explore the dataset of .CSV and excel files using
pandas.
import pandas as pd
def explore_dataset(file_path):
elif file_path.endswith('.xlsx'):
else:
print(df.info())
# Display the first few rows of the DataFrame print("\nFirst few rows of the dataset:")
print(df.head())
# Display summary statistics for numerical columns print("\nSummary statistics:")
print(df.describe())
# Display unique values for categorical columns print("\nUnique values for categorical
columns:")
{df[column].unique()}")
Scatter Plot
Output:-
Bar Chart
# Load dataset
df = pd.read_csv(r"C:\Users\PRIYA VINESH\Desktop\train.csv", header=None)
import pandas as pd
import numpy as np
# create a sample dataframe with missing values
print('Sample Dataframe with Missing Values')
df = pd.DataFrame({'A': [1, 2, np.nan, 4, 5],
'B': [6, np.nan, 8, 9, 10],
'C': [11, 12, 13, np.nan, 15]})
print(df)
# impute missing values with the mean
print('Dataframe after imputing missing values with the Mean')
df.fillna(df.mean(), inplace=True)
print(df)
Output
To Encode Categorical Variables
i) One-hot Encoding
import pandas as pd
# One-hot encoding
df_encoded = pd.get_dummies(df, columns=['Category'], prefix='Category')
Note:- This program creates a DataFrame with a categorical variable ('Category') and then uses the
pd.get_dummies function to perform one-hot
Output
One alternative to one-hot encoding is Label Encoding. In Label Encoding, each unique category
is assigned an integer label. Here is a simple Python program using the LabelEncoder from the
scikit-learn library:
from sklearn.preprocessing import LabelEncoder
# Example usage:
# Assume you have a DataFrame 'df' with a categorical column 'category'
# Replace this with your actual DataFrame and column names
# Sample DataFrame
import pandas as pd
Note:- This program uses the LabelEncoder to transform the categorical values in the
specified column ('Category' in this case) into numerical labels. The transformed
DataFrame is then printed.
It is possible to replace the sample DataFrame and column names with our actual data.
Output
To perform Feature Scaling
Here is a simple Python program using scikit-learn's MinMaxScaler for feature scaling:
i) Using MinMaxScaler
# Example usage:
# Assume you have a DataFrame 'df' with columns to be scaled
# Replace this with your actual DataFrame and column names
# Sample DataFrame
data = {'Feature1': [10, 20, 30, 40, 50],
'Feature2': [5, 15, 25, 35, 45]}
df = pd.DataFrame(data)
# Columns to scale
columns_to_scale = ['Feature1', 'Feature2']
This program uses the MinMaxScaler to scale the specified columns ('Feature1' and
'Feature2' in this case) to the range [0, 1]. It is possible to replace the sample DataFrame
and column names with our actual data.
Output:
8. Write a program to implement a linear regression model for regression tasks and
Train the model on a dataset with continuous target variables.
9. Write a program to implement a decision tree classifier using scikit-learn and
X = iris.data y = iris.target
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, y_train)
plt.show()
Output:
10. Write a program to Implement K-Means clustering and Visualize clusters.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs from sklearn.cluster import KMeans
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100, c='red',
label='Centroids')
plt.title('K-Means Clustering')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()
Output: