0% found this document useful (0 votes)
64 views30 pages

AI_UNIT_3

The document outlines the syllabus for CS 3491 - Artificial Intelligence and Machine Learning, focusing on supervised learning. It covers definitions and classifications of machine learning, including supervised, unsupervised, reinforcement, and semi-supervised learning, as well as techniques like classification, regression, and clustering. Additionally, it delves into linear regression models, including least squares regression, and provides examples of implementing these concepts in Python.

Uploaded by

ROHISIVAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views30 pages

AI_UNIT_3

The document outlines the syllabus for CS 3491 - Artificial Intelligence and Machine Learning, focusing on supervised learning. It covers definitions and classifications of machine learning, including supervised, unsupervised, reinforcement, and semi-supervised learning, as well as techniques like classification, regression, and clustering. Additionally, it delves into linear regression models, including least squares regression, and provides examples of implementing these concepts in Python.

Uploaded by

ROHISIVAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


II YEAR / IV SEM
CS3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
UNIT III SUPERVISED LEARNING

SYLLABUS:
Introduction to machine learning – Linear Regression Models: Least
squares, single & multiple variables, Bayesian linear regression,
gradient descent, Linear Classification Models: Discriminant function –
Probabilistic discriminative model - Logistic regression, Probabilistic
generative model – Naive Bayes, Maximum margin classifier – Support
vector machine, Decision Tree, Random forests
PART A
1. Define Machine Learning.
• Arthur Samuel, an early American leader in the field of computer gaming
and artificial intelligence, coined the term “Machine Learning ” in 1959
while at IBM.
• He defined machine learning as “the field of study that gives computers
the ability to learn without being explicitly programmed “.
• Machine learning is programming computers to optimize a performance
criterion using example data or past experience. The model may be
predictive to make predictions in the future, or descriptive to gain
knowledge from data.

2 Mention the various classification of Machine Learning


• Machine learning implementations are classified into four major
categories, depending on the nature of the learning “signal” or “response”
available to a learning system which are as follows:
▪ Supervised learning
▪ Unsupervised learning
▪ Reinforcement learning
▪ Semi-supervised learning

3. Define Supervised learning


• Supervised learning is the machine learning task of learning a function
that maps an input to an output based on example input-output pairs.
• The given data is labeled.

1
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• Both classification and regression problems are supervised learning


problems.

5. Define Unsupervised learning


• Unsupervised learning is a type of machine learning algorithm used to draw
inferences from datasets consisting of input data without labeled responses.
• In unsupervised learning algorithms, classification or categorization is not
included in the observations.
• In unsupervised learning the agent learns patterns in the input without any
explicit feedback.
• The most common unsupervised learning task is clustering: detecting
potentially useful clusters of input examples.

6. What is Reinforcement learning?


• In reinforcement learning the agent learns from a series of reinforcements:
rewards and punishments.
• Reinforcement learning is the problem of getting an agent to act in the world
so as to maximize its rewards.
• A learner is not told what actions to take as in most forms of machine
learning but instead must discover which actions yield the most reward by
trying them.

7. What is Semi-supervised learning?


• Semi-Supervised learning is a type of Machine Learning algorithm that
represents the intermediate ground between Supervised and Unsupervised
learning algorithms.
• It uses the combination of labeled and unlabeled datasets during the training
period, where an incomplete training signal is given: a training set with some
of the target outputs missing.

8. How to Categorize algorithm based on required Output?


• Classification
• Regression
• Clustering

9. Define Classification.
• The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of training
data.
• In Classification, a program learns from the given dataset or observations
and then classifies new observation into a number of classes or groups.

2
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can
be called as targets/labels or categories.

10. Define Regression.


• Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables.
• It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.

11. Define Clustering.


• Clustering or cluster analysis is a machine learning technique, which
groups the unlabeled dataset.
• It can be defined as "A way of grouping the data points into different
clusters, consisting of similar data points.
• The objects with the possible similarities remain in a group that has
less or no similarities with another group."


Where,
m: Slope
c: y-intercept

13. What is Least Squares Regression Line?


• Least squares are a commonly used method in regression analysis
for estimating the unknown parameters by creating a model which
will minimize the sum of squared errors between the observed data
and the predicted data.

14. Narrate Least Squares Regression Equation


• The equation that minimizes the total of all squared prediction
errors for known Y scores in the original correlation analysis.

3
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

where
Y´ represents the predicted value;
X represents the known value;
b and a represent numbers calculated from the original correlation
analysis

15. List and define the types of Linear Regression.


It is of two types: Simple and Multiple.
• Simple Linear Regression is where only one independent variable is
present and the model has to find the linear relationship of it with the
dependent variable
• Equation of Simple Linear Regression, where bo is the intercept, b1 is

16. Define Linear Regression Model.


• A Linear Regression model’s main aim is to find the best fit linear line and
the optimal values of intercept and coefficients such that the error is
minimized.

17. What is error or residual?


• Error is the difference between the actual value and Predicted value and the
goal is to reduce this difference.
• The vertical distance between the data point and the regression line is known
as error or residual.
• Each data point has one residual and the sum of all the differences is known
as the Sum of Residuals/Errors.

4
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

PART B

1. Define Machine Learning. Give an introduction to Machine Learning.

INTRODUCTION TO MACHINE LEARNING


1.1 Machine Learning
1.1.1 Definition of Machine Learning
1.1.2 Definition of learning
1.1.3 Examples
1.1.3.1 Handwriting recognition learning problem
1.1.3.2 A robot driving learning problem
1.2 Classification of Machine Learning
1.2.1 Supervised learning
1.2.2 Unsupervised learning
1.2.3 Reinforcement learning
1.2.4 Semi-supervised learning
1.3 Categorizing based on required Output
1.3.1 Classification
1.3.2 Regression
1.3.3 Clustering

1.1 Machine Learning:


1.1.1 Definition of Machine Learning:
• Arthur Samuel, an early American leader in the field of
computer gaming and artificial intelligence, coined the term
“Machine Learning ” in 1959 while at IBM.
• He defined machine learning as “the field of study that gives
computers the ability to learn without being explicitly
programmed “.
• Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
The model may be predictive to make predictions in the future,
or descriptive to gain knowledge from data.

1.1.2 Definition of learning:


• A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P , if
its performance at tasks T, as measured by P , improves with
experience E.

5
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

1.1.3 Examples
1.1.3.1 Handwriting recognition learning problem
• Task T : Recognizing and classifying handwritten words
within images
• Performance P : Percent of words correctly classified
• Training experience E : A dataset of handwritten words with
given classifications
1.1.3.2 A robot driving learning problem
• Task T : Driving on highways using vision sensors
• Performance P : Average distance traveled before an error
• Training experience E : A sequence of images and steering
commands recorded while observing a human driver

1.2 Classification of Machine Learning


• Machine learning implementations are classified into four major
categories, depending on the nature of the learning “signal” or
“response” available to a learning system which are as follows:
1.2.1 Supervised learning:
• Supervised learning is the machine learning task of learning a
function that maps an input to an output based on example
input-output pairs.
• The given data is labeled.
• Both classification and regression problems are supervised
learning problems.
• For example, the inputs could be camera images, each one
accompanied by an output saying “bus” or “pedestrian,” etc.
• An output like this is calleda label.
• The agent learns a function that, when given a new image,
predicts theappropriate label.

1.2.2 Unsupervised learning:


• Unsupervised learning is a type of machine learning algorithm
used to draw inferences from datasets consisting of input data
without labeled responses.
• In unsupervised learning algorithms, classification or
categorization is not included in the observations.
• In unsupervised learning the agent learns patterns in the input
without any explicit feedback.
• The most common unsupervised learning task is clustering:
detecting potentially useful clusters of input examples.

6
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• For example, when shown millions of images taken from the


Internet, a computer vision system can identify a large cluster
of similar images which an English speaker would call “cats.”

1.2.3 Reinforcement learning:


▪ In reinforcement learning the agent learns from a series of
reinforcements: rewards and punishments.
▪ Reinforcement learning is the problem of getting an agent to act
in the world so as to maximize its rewards.
▪ A learner is not told what actions to take as in most forms of
machine learning but instead must discover which actions yield
the most reward by trying them.
▪ For example — Consider teaching a dog a new trick: we cannot
tell him what to do, what not to do, but we can reward/punish
it if it does the right/wrong thing.

1.2.4 Semi-supervised learning:


• Semi-Supervised learning is a type of Machine Learning
algorithm that represents the intermediate ground between
Supervised and Unsupervised learning algorithms.
• It uses the combination of labeled and unlabeled datasets
during the training period, where an incomplete training signal
is given: a training set with some of the target outputs missing.

1.3 Categorizing based on required Output

1.3.1 Classification:
• The Classification algorithm is a Supervised Learning technique
that is used to identify the category of new observations on the
basis of training data.
• In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number
of classes or groups.
• Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
Classes can be called as targets/labels or categories.

1.3.1 Regression:
• Regression is a supervised learning technique which helps in
finding the correlation between variables and enables us to
predict the continuous output variable based on the one or more
predictor variables.

7
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• It is mainly used for prediction, forecasting, time series


modeling, and determining the causal-effect relationship
between variables.

1.3.2 Clustering:
• Clustering or cluster analysis is a machine learning technique,
which groups the unlabeled dataset.
• It can be defined as "A way of grouping the data points into
different clusters, consisting of similar data points.
• The objects with the possible similarities remain in a group that
has less or no similarities with another group."

2. Explain in detail about Linear Regression Models. Or Explain Linear


Regression Models: Least squares, single & multiple variables, Bayesian
linear regression, gradient descent.

2.1 Linear Regression


• In statistics, linear regression is a linear approach to modeling the
relationship between a dependent variable and one or more
independent variables.
• Let X be the independent variable and Y be the dependent variable.
• A linear relationship between these two variables as follows:

Where,
m: Slope
c: y-intercept

8
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• Linear regression algorithm shows a linear relationship between a


dependent (y) and one or more independent (x) variables, hence called
as linear regression.
• Linear regression finds how the value of the dependent variable is
changing according to the value of the independent variable.
• The linear regression model provides a sloped straight line
representing the relationship between the variables.
• Consider the below Figure 3.1, which represents the relationship
between independent and dependent variables

Figure 3.1 – Relationship between independent and dependent


variables

2.2 Least Squares Regression Line


• Least squares are a commonly used method in regression analysis
for estimating the unknown parameters by creating a model which
will minimize the sum of squared errors between the observed data
and the predicted data.
2.2.1 Least Squares Regression Equation
▪ The equation that minimizes the total of all squared prediction
errors for known Y scores in the original correlation analysis.

where
Y´ represents the predicted value;
X represents the known value;
b and a represent numbers calculated from the original correlation
analysis

9
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

2.2.2 Least Squares Regression in Python


Scenario: A rocket motor is manufactured by combining an igniter
propellant and a sustainer propellant inside a strong metal housing. It
was noticed that the shear strength of the bond between two propellers
is strongly dependent on the age of the sustainer propellant.
Problem Statement: Implement a simple linear regression algorithm
using Python to build a machine learning model that studies the
relationship between the shear strength of the bond between two
propellers and the age of the sustainer propellant.
Step 1: Import the required Python libraries.
# Importing Libraries
importnumpyasnp
import pandas aspd
importmatplotlib.pyplotasplt

Step 2: Next step is to read and load the dataset.


# Loading dataset
data = pd.read_csv('PropallantAge.csv')
data.head()
data.info()

Step 3: Create a scatter plot just to check the relationship between


these two variables.
# Plotting the data
plt.scatter(data['Age of Propellant'],data['Shear
Strength'])

Step 4: Next step is to assign X and Y as independent and dependent


variables respectively.
# Computing X and Y
X = data['Age of Propellant'].values
Y = data['Shear Strength'].values

Step 5: Compute the mean of variables X and Y to determine the values


of slope (m) and y-intercept.
Also, let n be the total number of data points.
# Mean of variables X and Y
mean_x = np.mean(X)
mean_y = np.mean(Y)
# Total number of data values
n = len(X)

10
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Step 6: Calculate the slope and the y-intercept using the formulas
# Calculating 'm' and 'c'
num = 0
denom = 0
for i in range(n):
num += (X[i] - mean_x) * (Y[i] - mean_y)
denom += (X[i] - mean_x) ** 2
m = num / denom
c = mean_y - (m * mean_x)

# Printing coefficients
print("Coefficients")
print(m, c)

The above step has given the values of m and c.


Substituting them ,
Shear Strength =
2627.822359001296 + (-37.15359094490524)
* Age of Propellant

Step 7: The above equation represents the linear regression model.


Let’s plot this graphically. Refer fig 3.2
# Plotting Values and Regression Line
maxx_x = np.max(X) + 10
minn_x = np.min(X) - 10

# line values for x and y


x = np.linspace(minn_x, maxx_x, 1000)
y=c+m*x
# Plotting Regression Line
plt.plot(x, y, color='#58b970', label='Regression Line')
# Plotting Scatter Points
plt.scatter(X, Y, c='#ef5423', label='Scatter Plot')
plt.xlabel('Age of Propellant (in years)')
plt.ylabel('Shear Strength')
plt.legend()
plt.show()

11
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Output:

Figure 3.2 – Example for Regression Line


2.3 Types of Linear Regression
It is of two types: Simple and Multiple.
o Simple Linear Regression is where only one independent variable is
present and the model has to find the linear relationship of it with the
dependent variable
Equation of Simple Linear Regression, where bo is the intercept, b1 is
coefficient or slope, x is the independent variable and y is the dependent
variable.

o In Multiple Linear Regression there are more than one independent


variables for the model to find the relationship.
Equation of Multiple Linear Regression, where bo is the intercept,
b1,b2,b3,b4…,bn are coefficients or slopes of the independent variables
x1,x2,x3,x4…,xn and y is the dependent variable.

2.4 Linear Regression Model


• A Linear Regression model’s main aim is to find the best fit linear line and
the optimal values of intercept and coefficients such that the error is
minimized.
• Error is the difference between the actual value and Predicted value and
the goal is to reduce this difference.

12
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Figure 3.3 – Example for Linear Regression Model

• In the above figure 3.3,


• x is our dependent variable which is plotted on the x-axis and y is the
dependent variable which is plotted on the y-axis.
• Black dots are the data points i.e the actual values.
• bo is the intercept which is 10 and b1 is the slope of the x variable.
• The blue line is the best fit line predicted by the model i.e the predicted
values lie on the blue line.
• The vertical distance between the data point and the regression line is
known as error or residual.
• Each data point has one residual and the sum of all the differences is known
as the Sum of Residuals/Errors.

Mathematical Approach:
Residual/Error = Actual values – Predicted Values
Sum of Residuals/Errors = Sum(Actual- Predicted Values)
Square of Sum of Residuals/Errors = (Sum(Actual- Predicted Values))2

2.5 Bayesian Regression


• Bayesian Regression is used when the data is insufficient in the dataset or
the data is poorly distributed.
• The output of a Bayesian Regression model is obtained from a probability
distribution.

13
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• The aim of Bayesian Linear Regression is to find the ‘posterior‘


distribution for the model parameters.
• The expression for Posterior is :

where
o Posterior: It is the probability of an event to occur; say, H, given that
another event; say, E has already occurred. i.e., P(H | E).
o Prior: It is the probability of an event H has occurred prior to another
event. i.e., P(H)
o Likelihood: It is a likelihood function in which some parameter
variable is marginalized.

• The Bayesian Ridge Regression formula is as follows:


p(y|λ) = N(w|0, λ^-1Ip)
where
▪ 'y' is the expected value,
▪ lambda is the distribution's shape parameter before the lambda
parameter
▪ the vector "w" is made up of the elements w0, w1,....

2.5.1 Implementation Of Bayesian Regression Using Python


• Boston Housing dataset, which includes details on the average price of
homes in various Boston neighborhoods.
• The r2 score will be used for evaluation.
• The crucial components of a Bayesian Ridge Regression model:

Program
fromsklearn.datasets import load_boston
fromsklearn.model_selection import train_test_split
fromsklearn.metrics import r2_score
fromsklearn.linear_model import BayesianRidge

# Loading the dataset


dataset = load_boston()
X, y = dataset.data, dataset.target

# Splitting the dataset into testing and training sets


X_train, X_test, y_train, y_test = train_test_split
(X, y, test_size = 0.15, random_state = 42)

14
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

# Creating to train the model


model = BayesianRidge()
model.fit(X_train, y_train)

# Model predicting the test data


prediction = model.predict(X_test)

# Evaluation of r2 score of the model against the test dataset


print(f"Test Set r2 score : {r2_score(y_test, prediction)}")

Output
Test Set r2 score : 0.7943355984883815

Advantages of Bayesian Regression:


• Very effective when the size of the dataset is small.
• Particularly well-suited for on-line based learning (data is received in
real-time), as compared to batch based learning, where we have the
entire dataset on our hands before we start training the model. This is
because Bayesian Regression doesn’t need to store data.
• The Bayesian approach is a tried and tested approach and is very
robust, mathematically. So, one can use this without having any extra
prior knowledge about the dataset.

Disadvantages of Bayesian Regression:


• The inference of the model can be time-consuming.
• If there is a large amount of data available for our dataset, the Bayesian
approach is not worth it.

2.6 Gradient descent


2.6.1 Cost Function
• The cost is the error in our predicted value.
• It is calculated using the Mean Squared Error function as shown in figure
3.4.

15
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Figure 3.4 – Example for Cost function

• The goal is to minimize the cost as much as possible in order to find the
best fit line.

2.6.2 Gradient Descent Algorithm.


• Gradient descent is an optimization algorithm that finds the best-fit line
for a given training dataset in a smaller number of iterations.
• If m and c are plotted against MSE, it will acquire a bowl shape as shown
in figure 3.4a and figure 3.4b.

Figure 3.4a – Process of gradient descent algorithm

16
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Figure 3.4b – Gradient Descent Shape

Learning Rate
• A learning rate is used for each pair of input and output values. It is a
scalar factor and coefficients are updated in direction towards minimizing
error.
• The process is repeated until a minimum sum squared error is achieved or
no further improvement is possible.

Step by Step Algorithm:


1. Initially, let m = 0, c = 0
Where L = learning rate — controlling how much the value of “m” changes
with each step.
The smaller the L, greater the accuracy. L = 0.001 for a good accuracy.
2. Calculating the partial derivative of loss function “m” to get the
derivative D.

17
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• Similarly, find the partial derivative with respect to c, Dc..

• Update the current values of m and c using the following equation:

• Repeat this process until our Cost function is very small (ideally 0).

3. Explain in detail about Linear Classification Models – Discriminant


function.

LINEAR CLASSIFICATION MODELS – DISCRIMINANT FUNCTION.


1.1 Linear Classification Models
1.2 Types of ML Classification Algorithms
1.3 Discriminant function

3.1 Linear Classification Models


• The Classification algorithm is a Supervised Learning technique that
is used to identify the category of new observations on the basis of
training data.
• In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam,
cat or dog, etc.

18
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• Classes can be called as targets/labels or categories.


• The output variable of Classification is a category, not a value, such
as "Green or Blue", "fruit or animal", etc.
• Since the Classification algorithm is a supervised learning
technique, hence it takes labeled input data, which means it
contains input with the corresponding output.
• In classification algorithm, a discrete output function(y) is mapped
to input variable(x).
y=f(x), where y = categorical output

• The best example of an ML classification algorithm is Email Spam


Detector.
• The goal of the classification algorithm is
o Take a D-dimensional input vector x
o Assign it to one of K discrete classes Ck , k = 1, . . . , K
• In the most common scenario, the classes are taken to be disjoint
and each input is assigned to one and only one class
• The input space is divided into decision regions
• The boundaries of the decision regions
o decision boundaries
o decision surfaces
• With linear models for classification, the decision surfaces are linear
functions, Classes that can be separated well by linear surfaces are
linearly separable.
• In the figure 3.5, there are two classes, class A and Class B.
• These classes have features that are similar to each other and
dissimilar to other classes.

Figure 3.5 – Example of Classification

19
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• The algorithm which implements the classification on a dataset is


known as a classifier.
• There are two types of Classifications:
o Two-class problems :
o Binary representation or Binary Classifier:
o If the classification problem has only two possible outcomes,
then it is called as Binary Classifier.
o There is a single target variable t ∈ {0, 1}
o t = 1 represents class C1
o t = 0 represents class C2
o Examples: YES or NO, MALE or FEMALE, SPAM or NOT
SPAM, CAT or DOG, etc.
o Multi-class Problems:
o If a classification problem has more than two outcomes,
then it is called as Multi-class Classifier.
o Example: Classifications of types of crops, Classification of
types of music.
o 1-of-K coding scheme
o There is a K-long target vector t, such that
If the class is Cj, all elements tk of t are zero for k ≠ j and
one for k = j tk is the probability that the class is Ck, K = 6
and Ck = 4, then t = (0, 0, 0, 1, 0, 0)T

• The simplest approach to classification problems is through


construction of a discriminant function that directly assigns each
vector x to a specific class

3.2 Types of ML Classification Algorithms:


• Logistic Regression
• K-Nearest Neighbors
• Support Vector Machines
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification

1.4 Discriminant function


• A function of a set of variables that is evaluated for samples of events
or objects and used as an aid in discriminating between or classifying
them.

20
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• A discriminant function (DF) maps independent (discriminating)


variables into a latent variable D.
• DF is usually postulated to be a linear function:
D = a0 + a1 x1 + a2 x2 ... aNxN

• The goal of discriminant analysis is to find such values of the


coefficients {ai, i=0,...,N} that the distance between the mean values of
DF is maximal for the two groups.
• Whenever there is a requirement to separate two or more classes
having multiple features efficiently, the Linear Discriminant Analysis
model is considered the most common technique to solve such
classification problems.
• For example, if there are classes with multiple features and need to
separate them efficiently. Classify them using a single feature, then it
may show overlapping as shown in figure 3.6.

Figure 3.6 – Example for Classification using single feature


• To overcome the overlapping issue in the classification process, must
increase the number of features regularly.

4. Explain in detail about Linear Discriminant Functions and its types. Also
elaborate about logistic regression in detail.

LINEAR DISCRIMINANT FUNCTIONS AND LOGISTIC REGRESSION


4.1 Linear Discriminant Functions
4.2 The Two-Category Case
4.3 The Multi-category Case
4.4 Generalized Linear Discriminant Functions
4.5 Probabilistic discriminative models
4.6 Logistics Regression
4.6.1 Logistic Function (Sigmoid Function)
4.6.2 Assumptions for Logistic Regression
4.6.3 Logistic Regression Equation
4.6.4 Type of Logistic Regression
4.6.5 Steps in Logistic Regression
4.6.6 Advantages of Logistic Regression Algorithm

21
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

4.1 Linear Discriminant Functions


A discriminant function that is a linear combination of the
components of x can be written as

(3.1)
where w is the weight vector and w0 the bias or threshold weight.

4.2 The Two-Category Case


• For a discriminant function of the form of eq.3.1, a two-category
classifier implements the following decision rule:
• Decide w1 if g(x)>0 and w2 if g(x)<0.
• Thus, x is assigned to w1 if the inner product wTx exceeds the
threshold – w0 and to w2 otherwise.
• If g(x)=0, x can ordinarily be assigned to either class, or can be left
undefined.
• The equation g(x)=0 defines the decision surface that separates
points assigned to w1 from points assigned to w2.
• When g(x) is linear, this decision surface is a hyperplane.
• If x1 and x2 are both on the decision surface, then

(3.2)
or

(3.3)

Figure 3.7: The linear decision boundary H separates the feature


space into two half-spaces.

22
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• In figure 3.7, the hyperplane H divides the feature space into two
half-spaces:
o Decisionregion R1 for w1
o region R2 for w2.
• The discriminant function g(x) gives an algebraic measure of the
distance from x to the hyperplane.
• The way to express x as

(3.4)
where xp is the normal projection of x onto H, and r is the desired
algebraic distance which is positive if x is on the positive side and
negative if x is on the negative side. Then, because g(xp)=0,

or

(3.6)

o The distance from the origin to H is given by .


o If w0>0, the origin is on the positive side of H, and if w0<0, it is on the
negative side.

o If w0=0, then g(x) has the homogeneous form , and the hyperplane
passes through the origin

23
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

4.3 The Multi-category Case


• To devise multi category classifiers employing linear discriminant
functions reduce the problem to c two-class problems.
• Defining c linear discriminant functions

(3.7)

and assigning x to wi if for all j¹ i; in case of ties, the


classification is left undefined.
• The resulting classifier is called a linear machine.

is normal to Hij and the signed distance from x to Hij is given by

4.4 Generalized Linear Discriminant Functions


• The linear discriminant function g(x) can be written as

(3.8)

where the coefficients wi are the components of the weight vector w.

24
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Quadratic Discriminant Function

(3.9)

4.5 Probabilistic discriminative models


• Discriminative models are a class of supervised machine learning
models which make predictions by estimating conditional
probability P(y|x).
• For the two-class classification problem, the posterior probability of
classC1 can be written as a logistic sigmoid acting on a linear function of
x

• For the multi-class case, the posterior probability of class Ckis given by a
softmax transformation of a linear function of x

4.6 Logistics Regression


o Logistic regression is the Machine Learning algorithms, under the
classification algorithm of Supervised Learning technique.
o Logistic regression is used to describe data and the relationship between one
dependent variable and one or more independent variables.
o The independent variables can be nominal, ordinal, or of interval type.
o Logistic regression predicts the output of a categorical dependent variable.
o Therefore the outcome must be a categorical or discrete value.
o It can be either Yes or No, 0 or 1, true or False, etc. it gives the probabilistic
values which lie between 0 and 1.
o Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
o The figure 3.9 predicts the logistic function

25
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

Figure 3.9 – Logistic Function or Sigmoid Function

4.6.1 Logistic Function (Sigmoid Function):


o The logistic function is also known as the sigmoid function.
o The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
o The value of the logistic regression must be between 0 and 1, so it forms
a curve like the "S" form.
o The S-form curve is called the Sigmoid function or the logistic function.
4.6.2 Assumptions for Logistic Regression:
o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.
4.6.3 Logistic Regression Equation:
• The Logistic regression equation can be obtained from the Linear
Regression equation.
• The mathematical steps are given below:
• The equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, let's divide the


above equation by (1-y):

o For the range between -[infinity] to +[infinity], take logarithm of the


equation:

The above equation is the final equation for Logistic Regression.

26
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

4.6.4 Type of Logistic Regression:


• Logistic Regression can be classified into three types:
o Binomial: In binomial Logistic regression, there can be only two
possible types of the dependent variables, such as 0 or 1, Pass or Fail,
etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or
more possible unordered types of the dependent variable, such as
"cat", "dogs", or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more
possible ordered types of dependent variables, such as "low",
"Medium", or "High".

4.6.5 Steps in Logistic Regression:


• To implement the Logistic Regression using Python, the steps are given
below:
• Data Pre-processing step
• Fitting Logistic Regression to the Training set
• Predicting the test result
• Test accuracy of the result
• Visualizing the test set result.

4.6.6 Advantages of Logistic Regression Algorithm


• Logistic regression performs better when the data is linearly separable
• It does not require too many computational resources
• There is no problem scaling the input features
• It is easy to implement and train a model using logistic regression

5. Elaborate in detail about Probabilistic Generative model and Naïve


Bayes.

PROBABILISTIC GENERATIVE MODEL AND NAÏVE BAYES


5.1 Probabilistic Generative model
5.2 Simple example
5.3 Generative models
5.4 Discriminative models

5.1 Probabilistic Generative model


• Given a model of one conditional probability, and estimated probability
distributions for the variables X and Y, denoted P(X) and P(Y), can
estimate the conditional probability using Bayes' rule:

27
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

• A generative model isa statistical model of the joint probability


distribution on given observable variable X and target variable Y.
Given a generative model for P(X|Y), can estimate:

• A discriminative model is a model of the conditional probability of the


target Y, given an observation xgiven a discriminative model forP(Y|X),
can estimate:

• Classifier based on a generative model is a generative classifier, while a


classifier based on a discriminative model is a discriminative classifier

5.3 Generative models


Types of generative models are:
• Naive Bayes classifier or Bayesian network
• Linear discriminant analysis

5.4 Discriminative models


• Logistic regression
• Support Vector Machines
• Decision Tree Learning
• Random Forest

28
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING APEC

6. Elaborate in detail about Support Vector Machine (SVM).

6.1 Support Vector Machine (SVM)


• Support Vector Machine(SVM) is a supervised machine learning
algorithm used for both classification and regression.
• The objective of SVM algorithm is to find a hyperplane in an N-
dimensional space that distinctly classifies the data points.
• Hyperplanes are decision boundaries that help classify the data points.
• The dimension of the hyperplane depends upon the number of features.
• If the number of input features is 2, then the hyperplane is just a line.
• If the number of input features is 3, then the hyperplane becomes a two-
dimensional plane.
• It becomes difficult to imagine when the number of features exceeds 3.
• The objective is to find a plane that has the maximum margin, i.e the
maximum distance between data points of both classes.
• Support vectors are data points that are closer to the hyperplane and
influence the position and orientation of the hyperplane.
• Using these support vectors, can maximize the margin of the classifier.
• Deleting the support vectors will change the position of the hyperplane.
• Example Refer Figure 3.10

29
Figure 3.10 – Example for Support Vectors

• Let’s consider two independent variables x1, x2 and one dependent

Figure 3.11 - Linearly Separable Data points

You might also like