0% found this document useful (0 votes)

154 views51 pages

ABP DWDM UNIT 4 Classification 1

This document outlines the contents of Unit IV - Classification for a course on Data Mining and Data Warehousing. It discusses the basic concepts of classification, including the two-step process of model construction using a training set followed by model usage to classify new data. Examples of classification models for loan approval and faculty tenure are provided. The key differences between supervised and unsupervised learning are highlighted. Issues related to data preparation, model evaluation, and interpretability of classification methods are also covered.

Uploaded by

Tatipamula Ratnakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views51 pages

ABP DWDM UNIT 4 Classification 1

Uploaded by

Tatipamula Ratnakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

B.

TECH VI Semester
COMPUTER SCIENCE AND ENGINEERING
VCE-R15 2017 – 2018
DATA MINING AND DATA WAREHOUSING
(A3522)
UNIT – IV
CLASSIFICATION

A. BHANU PRASAD
Associate Professor of CSE
9885990509
[email protected]

VARDHAMAN COLLEGE OF ENGINEERING

(AUTONOMOUS)
Shamshabad – 501218, Hyderabad, AP
Text Books / References / Websites
TEXT BOOKS:
1. Jiawei Han, Micheline Kamber, Jian Pei (2012), Data Mining: Concepts
and Techniques, 3rd edition, Elsevier, United States of America.

REFERENCE BOOKS:
1. Margaret H Dunham (2006), Data Mining Introductory and
Advanced Topics, 2nd edition, Pearson Education, New Delhi, India.
2. Amitesh Sinha(2007), Data Warehousing, Thomson Learning, India.
3. Xingdong Wu, Vipin Kumar (2009), The Top Ten Algorithms in Data
Mining, CRC Press, UK.
4. Max Barmer(2007), Principles of Data Mining, Springer, USA.

2
UNIT – III CONTENTS
4. CLASSIFICATION
4.1 Basic Concepts
4.2 Decision Tree Induction
4.3 Bayesian Classification Methods
4.4 Rule-Based Classification
4.5 Model Evaluation and Selection
4.6 Techniques to Improve Classification Accuracy
4.7 Classification by Neural Networks
4.8 Support Vector Machines
4.9 Classification Using Frequent Patterns: Pattern-Based
Classification
4.10 Lazy Learners

3
4. CLASSIFICATION
4.1. Basic Concepts
 There are two forms of data analysis that can be used for extracting
models describing important classes or to predict future data trends.
 Two forms of data analysis:
1) Classification
2) Prediction
1) Classification is a form of data analysis that extracts models
describing important data classes. Such models, called classifiers,
predict categorical (discrete, unordered) class labels.
 It classifies data (constructs a model) based on the training set
and the values (class labels) in a classifying attribute and uses it in
classifying new data.
2) Prediction models continuous-valued functions, i.e., predicts
unknown or missing values.

 credit approval-“safe” or “risky”

 target marketing-“yes” or “no”
 medical diagnosis-“treatment A,” “treatment B,” or “treatment C”
 These categories can be represented by discrete values, where the
ordering among values has no meaning.
4
Classification: A Two-Step Process
Data classification is a two-step process, consisting of
1) A learning step (where a classification model is constructed) and
2) A classification step (where the model is used to predict class labels
for given data).
1) Learning step (Model construction):
• This is the training phase, where a classification algorithm builds the
classifier by analyzing or “learning from” a training set made up of
database tuples and their associated class labels.
• A tuple, X, is represented by an n-dimensional attribute vector, X=(x1,
x2, … , xn), depicting n measurements made on the tuple from n
database attributes, respectively, A1, A2, …, An.
• Each tuple, X, is assumed to belong to a predefined class as
determined by another database attribute called the class label
attribute. The class label attribute is discrete-valued and unordered.
• It is categorical (or nominal) in that each value serves as a category or
class. The individual tuples making up the training set are referred to
as training tuples and are randomly sampled from the database under
analysis.
• Data tuples can be referred to as samples, examples, instances, data
points, or objects.
5
Classification Process - 1: Model Construction
Ex(1): A bank loan officer wants to analyze the data in order to know which
customer (loan applicant) are risky or which are safe. Model construction for
Loan Approval.
(a) Learning: Training data are analyzed by a classification algorithm. Here, the
class label attribute is loan decision, and the learned model or classifier is
represented in the form of classification rules.
Classification Algorithm Classifier
(Model)
Training Data
Classification Rules
name age income loan
IF age=youth THEN loan=‘risky’
Sandy youth low risky
IF income=high THEN loan=‘safe’
Lee youth low risky IF age=middle AND income=low
Caroline middle high safe THEN loan=‘risky’
….
Rick middle low risky
Susan senior low safe  The rules can be used to categorize
future data tuples, as well as provide
Claire senior medium safe deeper insight into the data contents.
Joe middle high safe They also provide a compressed data
… … … … representation. 6
Contd..
 Ex(2): Model construction of faculty for tenured (permanent post).

Classification
Algorithms

Training
Data

NAME RANK YEARS TENURED Classifier

M ike A ssistant P rof 3 no (Model)
M ary A ssistant P rof 7 yes
B ill P rofessor 2 yes
Jim A ssociate P rof 7 yes IF rank = ‘professor’
D ave A ssistant P rof 6 no OR years > 6
A nne A ssociate P rof 3 no THEN tenured = ‘yes’
7
Classification Process - 2:
Use the Model in Prediction
2) Classification step (Model usage): for classifying future or unknown
objects
• Estimate accuracy of the model.
• The known label of test sample is compared with the classified
result from the model.
• Accuracy rate is the percentage of test set samples that are
correctly classified by the model.
• Test set is independent of training set, otherwise over-fitting will
occur.
 Ex(1): Classification for Loan Approval.
Classification Rules

Test Data
New Data
name age income loan
Bello senior low safe (John Henry, middle, low)
Sylvia middle low risky Loan decision?
Anne middle high safe (Prediction)
… … … … 8
Contd..
(b) Classification: Test data are used to estimate the accuracy of the
classification rules. If the accuracy is considered acceptable, the rules can
be applied to the classification of new data tuples.

 Ex(2): Classification for faculty to predict tenured (permanent post).

Classifier

Unseen Data
Testing
Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom A ssistan t P ro f 2 no
(Prediction)
M erlisa A sso c iate P ro f 7 no
Tenured?
G eo rg e P ro fesso r 5 yes
J o sep h A ssistan t P ro f 7 yes
9
Supervised vs. Unsupervised Learning
 Supervised learning (classification)
• Supervision: The training data (observations, measurements, etc.)
are accompanied by labels indicating the class of the observations.
• New data is classified based on the training set.
 Unsupervised learning (clustering)
• The class labels of training data is unknown.
• The number or set of classes to be learned may not be known in
advance.
• Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data.

10
Issues regarding Classification & Prediction
1) Data Preparation
 Data cleaning: Preprocess data in order to reduce noise and
handle missing values
 Relevance analysis (feature selection): Remove the irrelevant or
redundant attributes
 Data transformation: Generalize and/or normalize data
2) Evaluating Classification Methods
 Predictive accuracy
 Speed and scalability
• time to construct the model
• time to use the model
• efficiency in disk-resident databases
 Robustness: handling noise and missing values
 Interpretability: understanding and insight provided by the model
 Goodness of rules
• decision tree size
• compactness of classification rules

11
Classification Techniques
1) Decision Tree based Methods
2) Rule-based Methods
3) Memory based reasoning
4) Neural Networks
5) Naïve Bayes and Bayesian Belief Networks
6) Support Vector Machines

12
4.2. Decision Tree Induction
 Decision tree induction is the learning of decision trees from class-
labeled training tuples.
 Decision tree : A flow-chart-like directed acyclic tree structure where
• Internal (non-leaf) node denoted by rectangle represents a test on
an attribute
• Branch represents an outcome of the test
• Leaf (terminal) nodes denoted by ovals represent class labels or
class distribution
• Root node is the topmost node in a tree.
 Decision tree generation consists of two phases
1) Tree construction
• Given a tuple, X, for which the associated class label is unknown,
the attribute values of the tuple are tested against the decision tree.
• A path is traced from the root to a leaf node, which holds the class
prediction for that tuple.
• Decision trees can easily be converted to classification rules.
• Partition examples recursively based on selected attributes
2) Tree pruning
• Identify and remove branches that reflect noise or outliers
13
Contd..
 Use of decision tree:
• Classifying an unknown sample.
• Test the attribute values of the sample against the decision tree.
• Decision tree induction algorithms have been used for
classification in many application areas such as medicine,
manufacturing and production, financial analysis, astronomy, and
molecular biology.
• Decision trees are the basis of several commercial rule induction
systems.

14
Training Dataset
Ex: A marketing manager at a company needs to analyze a customer
with a given profile, who will buy a new computer.
RID age income student Credit_rating Class: buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle medium no excellent yes
13 middle high yes fair yes
14 senior medium no excellent no 15
Output: A Decision Tree for “buys_computer”
 A decision tree for the concept buys computer, indicating whether a
customer is likely to purchase a computer.
 Each internal (nonleaf) node represents a test on an attribute.
 Each leaf node represents a class (either buys_computer = yes or
buys_computer = no).

age?

youth middle senior

student? yes credit rating?

no yes excellent fair

Buys Computer? Buys Computer?
no yes no yes

16
Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
• Tree is constructed in a top-down recursive divide-and-conquer
manner
• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued, they are
discretized in advance)
• Examples are partitioned recursively based on selected attributes
• Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
 Conditions for stopping partitioning
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
• There are no samples left

17
Contd..

18
Extracting Classification Rules from Trees
 Represent the knowledge in the form of IF-THEN rules
 One rule is created for each path from the root to a leaf
 Each attribute-value pair along a path forms a conjunction
 The leaf node holds the class prediction
 Rules are easier for humans to understand
 Example
• IF age=“<=30” AND student=“no” THEN buys_computer = “no”
• IF age=“<=30” AND student=“yes” THEN buys_computer = “yes”
• IF age = “31…40” THEN buys_computer = “yes”
• IF age=“>40” AND credit_rating=“excellent” THEN
buys_computer=“yes”
• IF age=“>40” AND credit_rating=“fair” THEN buys_computer=“no”

19
Attribute Selection Measures
 An attribute selection measure is a heuristic for selecting the splitting
criterion that “best” separates a given data partition, D, of class-
labeled training tuples into individual classes. In other words, we are
looking for the probability that tuple X belongs to class C, given that
we know the attribute description of X.
 Information Gain
 Gain Ratio
 Gini Index
 The Gini index is used in CART. Using the notation previously
described, the Gini index measures the impurity of D, a data partition
or set of training tuples, as

 where pi is the probability that a tuple in D belongs to class Ci and is

estimated by |Ci,D|/|D|. The sum is computed over m classes.
 The Gini index considers a binary split for each attribute. Let’s first
consider the case where A is a discrete-valued attribute having v
distinct values, {a1, a2, …, av} occurring in D.
20
Attribute Selection Measures
 To determine the best binary split on A, we examine all the possible
subsets that can be formed using known values of A. Each subset, SA,
can be considered as a binary test for attribute A of the form “A
SA?”
 Given a tuple, this test is satisfied if the value of A for the tuple is
among the values listed in SA. If A has v possible values, then there
are 2v possible subsets. For example, if income has three possible
values, namely {low, medium, high}, then the possible subsets are
{low, medium, high}, {low, medium}, {low, high}, {medium, high},
{low}, {medium}, {high}, and [ }. We exclude the power set, flow,
medium, highg, and the empty set from consideration since,
conceptually, they do not represent a split. Therefore, there are 2v
􀀀 2 possible ways to form two partitions of the data, D, based on a
binary split on A.

21
4.3. Bayesian Classification Methods
Bayesian classification is based on Bayes’ theorem. Bayesian classifiers
are statistical classifiers. They can predict class membership
probabilities such as the probability that a given tuple belongs to a
particular class.
 Probabilistic learning: Calculate explicit probabilities for hypothesis,
among the most practical approaches to certain types of learning
problems
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct. Prior
knowledge can be combined with observed data.
 Probabilistic prediction: Predict multiple hypotheses, weighted by
their probabilities
 Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured.

22
Bayesian Classification Contd..
Bayes’ Theorem is named after Thomas Bayes.
 Let X be a data tuple. In Bayesian terms, X is considered as
“evidence” whose class label is “unknown” and is described by
measurements made on a set of n attributes.
 Let H be some Hypothesis such as that the data tuple X belongs to a
specified class C.
 For classification problems, we want to determine P(H|X), (posterior
probability of H conditioned on X) the probability that the
hypothesis H holds given the “evidence” or observed data tuple X.
 P(H) is the prior probability, or a priori probability, of H which is
independent of X.
 Similarly, P(X|H) (likelihood) is the posterior probability of X
conditioned on H. Class Prior Probability
Likelihood
 P(X) is the prior probability of X.
 Bayes’ theorem is: P(H | X )  P( X | H )P(H )
P( X )
Posterior Probability
Predictor Prior Probability
23
Bayesian Classification Example

X: 35 years old customer with an

income of $40,000 and fair credit
rating.
H: Hypothesis that the customer will buy a
computer.
Class Prior Probability
Likelihood

P(H | X )  P( X | H )P(H )
P( X )
Posterior Probability Predictor Prior Probability
 P(H|X) (posterior probability of H conditioned on X): Probability (of
hypothesis H) that the customer will buy a computer given that we know
(“evidence”, X) his age, income and credit rating.
 P(H) (prior probability of H): Probability (of H) that the customer will buy a
computer regardless of age, income and credit rating (independent of X).
 P(X|H) (posterior probability of X conditioned on H): Probability that the
customer is 35 years old customer with an income of $40,000 and fair credit
rating, given that he has bought our computer.
 P(X) (prior probability of X): Probability that a person from our set of
customers is 35 yrs old, earns $40,000 have fair credit rating. 24
Naive Bayesian Classification
 Naive Bayesian classifiers assume that the effect of an attribute value
on a given class is independent of the values of the other attributes.
This assumption is called class conditional independence. It is made
to simplify the computations involved and, in this sense, is
considered “naive.”
Naive Bayesian Classification or simple Bayesian classifier:
 D be a training set of tuples
 n-dimensional attribute vector, X = (x1, x2, … , xn),
 m classes, C1, C2, … , Cm.
 Given a tuple, X, the classifier will predict that X belongs to the class
having the highest posterior probability, conditioned on X. That is,
the naive Bayesian classifier predicts that tuple X belongs to the class
Ci if and only if P(Ci|X) > P(Cj|X) for 1 ≤ j ≤ m, j ≠ i.
 Thus, we maximize P(Ci|X). The class Ci for which P(Ci|X) is
maximized is called the maximum posteriori hypothesis.

25
Contd..
P(X|Ci) P(Ci)
 By Bayes’ theorem : P(Ci|X)=
P(X)
 As P(X) is constant for all classes, only P(X|Ci)P(Ci) needs to be
maximized.
 The class prior probabilities may be estimated by
P(Ci)= |Ci,D| / |D|, where |Ci,D| is the number of training tuples of
class Ci in D.
 To reduce computation in evaluating P(X|Ci), the naive assumption
of class-conditional independence is made. Thus,
 P(X|Ci)=ς𝑛 𝑘=1 P(xk|Ci)
=P(x1|Ci) * P(x2|Ci) * … * P(xn|Ci).
 We can easily estimate the probabilities P(x1|Ci), P(x2|Ci),…
P(xn|Ci)from the training tuples. Here xk refers to the value of
attribute Ak for tuple X.
 For each attribute, we look at whether the attribute is categorical or
continuous-valued.
26
Contd..
 For instance, to compute P(X|Ci), we consider the following:
 (a) If Ak is categorical, then P(xk|Ci) is the number of tuples of class
Ci in D having the value xk for Ak, divided by |Ci,D|, the number of
tuples of class Ci in D.
 If Ak is continuous-valued, then we need to do a bit more work, but
the calculation is pretty straightforward. A continuous-valued
attribute is typically assumed to have a Gaussian distribution with a
mean μ and standard deviation σ, defined by
1
 g(x, μ, σ)=
2Пσ

 so that P(xk|Ci)= g(xk, μ Ci , σ Ci ).

 To predict the class label of X, P(X|Ci) P(Ci) is evaluated for each
class Ci . The classifier predicts that the class label of tuple X is the
class Ci if and only if P(X|Ci) P(Ci) > P(X|Cj) P(Cj) for 1 ≤ j ≤ m, j ≠ i.
 In other words, the predicted class label is the class Ci for which
P(X|Ci) P(Ci) is the maximum.
27
4.4. Rule-Based Classification
 In rule-based classifiers, the learned model is represented as a set of
IF-THEN rules.
4.4.1 Using IF-THEN Rules for Classification:
 Rules are a good way of representing information or bits of
knowledge. A rule-based classifier uses a set of IF-THEN rules for
classification.
 An IF-THEN rule is an expression of the form:
IF condition THEN conclusion
 An example is rule R1: IF age = youth AND student = yes THEN
buys_computer = yes.
 The “IF” part (or left side) of a rule is known as the rule antecedent or
precondition.
 The “THEN” part (or right side) is the rule consequent. In the rule
antecedent, the condition consists of one or more attribute tests (e.g., age =
youth and student = yes) that are logically ANDed.
 The rule’s consequent contains a class prediction (in this case, we are
predicting whether a customer will buy a computer).
 R1 can also be written as: R1: (age = youth) ^ (student = yes) =>
28
Rule Extraction from a Decision Tree
 We can build a rule based classifier by extracting IF-THEN rules
from a decision tree.
 To extract rules from a decision tree, one rule is created for each path
from the root to a leaf node.
 Each splitting criterion along a given path is logically ANDed to form
the rule antecedent (“IF” part). The leaf node holds the class
prediction, forming the rule consequent (“THEN” part).
 The decision tree of Figure can be converted to classification IF-
THEN rules by tracing the path from the root node to each leaf node
in the tree.
 The rules extracted from Figure are as follows:
• R1: IF age = youth AND student = no THEN buys_computer = no
• R2: IF age = youth AND student = yes THEN buys_computer =
yes
• R3: IF age = middle THEN buys_computer = yes
• R4: IF age = senior AND credit_rating = excellent THEN
buys_computer = no
• R5: IF age = senior AND credit_rating = fair THEN
buys_computer = yes
29
4.5. Model Evaluation and Selection
 Metrics for Evaluating Classifier Performance
 Positive tuples (P): tuples of the main class of interest.
Ex: buys_computer=yes
 Negative tuples (N): all other tuples. Ex: buys_computer=no.
“Building blocks” used in computing many evaluation measures:
 True positives (TP): These refer to the positive tuples that were
correctly labeled by the classifier. (e.g., buys_computer = yes)
 True negatives (TN): These are the negative tuples that were
correctly labeled by the classifier. (e.g., buys_computer = no)
 False positives (FP): These are the negative tuples that were
incorrectly labeled as positive (e.g., tuples of class buys_computer =
no for which the classifier predicted buys_computer = yes).
 False negatives (FN): These are the positive tuples that were
mislabeled as negative (e.g., tuples of class buys_computer = yes for
which the classifier predicted buys_computer = no).
 These terms are summarized in the confusion matrix of Figure .
30
Contd..
 The confusion matrix is a useful tool for analyzing how well your
classifier can recognize tuples of different classes. TP and TN tell us
when the classifier is getting things right, while FP and FN tell us
when the classifier is getting things wrong.

Table: Confusion matrix for the

classes buys_computer = yes and
buys_computer = no, where an entry
in row i and column j shows the
number of tuples of class i that were
labeled by the classifier as class j.
Ideally, the non diagonal entries
should be zero or close to zero.

Fig: Confusion matrix, shown with totals for positive and negative tuples 31
Metrics for Evaluating Classifier Performance

32
4.6. Techniques to Improve Classification Accuracy
 An ensemble for classification is a composite model, made up of a
combination of classifiers.
 The individual classifiers vote, and a class label prediction is returned
by the ensemble based on the collection of votes.
 Ensembles tend to be more accurate than their component classifiers.
Popular Ensemble methods:
 Bagging
 Boosting and
 Random forests

Fig-Increasing classifier accuracy: Ensemble methods generate a set of

classification models, M1, M2, : : : , Mk. Given a new data tuple to classify, each
classifier “votes” for the class label of that tuple. The ensemble combines the
votes to return a class prediction. 33
Bagging
 Given a set, D, of d tuples, bagging works as follows. For iteration i
{i=1, 2, : : : , k},a training set, Di , of d tuples is sampled with
replacement from the original set of tuples, D.
 Note that the term bagging stands for bootstrap aggregation. Each
training set is a bootstrap sample.
 A classifier model, Mi , is learned for each training set, Di .
 To classify an unknown tuple, X, each classifier, Mi , returns its class
prediction, which counts as one vote.
 The bagged classifier, M*, counts the votes and assigns the class with
the most votes to X.
 Bagging can be applied to the prediction of continuous values by
taking the average value of each prediction for a given test tuple.
 The bagged classifier often has significantly greater accuracy than a
single classifier derived from D, the original training data.
 The increased accuracy occurs because the composite model reduces
the variance of the individual classifiers.

34
Boosting
 In boosting, weights are also assigned to each training tuple. A series
of k classifiers is iteratively learned.
 After a classifier, Mi , is learned, the weights are updated to allow the
subsequent classifier,Mi+1, to “pay more attention” to the training
tuples that were misclassified by Mi .
 The final boosted classifier, M*, combines the votes of each individual
classifier, where the weight of each classifier’s vote is a function of its
accuracy.
 AdaBoost (short for Adaptive Boosting) is a popular boosting
algorithm.
 While both Bagging and Boosting can significantly improve accuracy
in comparison to a single model, boosting tends to achieve greater
accuracy.

35
Random Forests
 Imagine that each of the classifiers in the ensemble is a decision tree
classifier so that the collection of classifiers is a “forest.”
 The individual decision trees are generated using a random selection
of attributes at each node to determine the split.
 More formally, each tree depends on the values of a random vector
sampled independently and with the same distribution for all trees in
the forest.
 During classification, each tree votes and the most popular class is
returned.

 Because random forests consider

many fewer attributes for each split,
they are efficient on very large
databases. They can be faster than
either bagging or boosting. Random
forests give internal estimates of
variable importance.

36
Improving Classification Accuracy of
Class-Imbalanced Data
 Given two-class data, the data are class-imbalanced if the main class
of interest (the positive class) is represented by only a few tuples,
while the majority of tuples represent the negative class.
Approaches include
1) Oversampling works by resampling the positive tuples so that the
resulting training set contains an equal number of positive and
negative tuples.
2) Undersampling works by decreasing the number of negative tuples.
It randomly eliminates tuples from the majority (negative) class
until there are an equal number of positive and negative tuples.
3) Threshold-moving moves the threshold, t, so that the rare class
tuples are easier to classify (and hence, there is less chance of costly
false negative errors). Examples: naive Bayesian classifiers and
neural network classifiers – backpropagation
4) Ensemble techniques - Ensemble multiple classifiers introduced
above.

37
4.7 Classification by Neural Networks
 A neural network is a set of connected input/output units in which
each connection has a weight associated with it.
 During the learning phase, the network learns by adjusting the
weights so as to be able to predict the correct class label of the input
tuples.
 Neural network learning is also referred to as connectionist learning
due to the connections between units.
Advantages
• prediction accuracy is generally high
• robust, works when training examples contain errors
• output may be discrete, real-valued, or a vector of several discrete
or real-valued attributes
• fast evaluation of the learned target function
Criticism
• long training time
• difficult to understand the learned function (weights)
• not easy to incorporate domain knowledge 38
A Multilayer Feed-Forward Neural Network
 A multilayer feed-forward neural network consists of an input layer, one or
more hidden layers, and an output layer.
 Each layer is made up of units. The inputs to the network correspond to the
attributes measured for each training tuple.
 The inputs are fed simultaneously into the units making up the input layer.
 These inputs pass through
the input layer and are then Output
weighted and fed Input vector
vector: xi
simultaneously to a second
layer of “neuronlike” units,
known as a hidden layer.
The outputs of the hidden
layer units can be input to
another hidden layer, and
so on.

Fig: Multilayer feed-forward neural network

 The number of hidden layers is arbitrary, although in practice, usually only
one is used. The weighted outputs of the last hidden layer are input to units
making up the output layer, which emits the network’s prediction for given
tuples.
39
Network Pruning and Rule Extraction
Network pruning
 Fully connected network will be hard to articulate.
 N input nodes, h hidden nodes and m output nodes lead to h(m+N)
weights.
 Pruning: Remove some of the links without affecting classification
accuracy of the network.
Extracting rules from a trained network
 Discretize activation values; replace individual activation value by the
cluster average maintaining the network accuracy.
 Enumerate the output from the discretized activation values to find
rules between activation value and output.
 Find the relationship between the input and activation value.
 Combine the above two to have rules relating the output to input.

40
4.8 Support Vector Machines
 Support vector machines (SVMs), a method for the classification of
both linear and nonlinear data.
 An SVM is an algorithm that uses a nonlinear mapping to transform
the original training data into a higher dimension.
The Case When the Data Are Linearly Separable:
 Let us consider a two-class problem, where the classes are linearly
separable.
 Let the data set D be given as (X1, y1), (X2, y2), : : : , (X|D|, y|D|), where
Xi is the set of training tuples with associated class labels, yi .
 Each yi can take one of two values, either +1 or -1 (i.e., yi Є {+1,-1}),
corresponding to the classes buys_computer=yes and
buys_computer=no, respectively.
 From the graph, we see that the 2-D data are linearly separable (or
“linear,” for short), because a straight line can be drawn to separate
all the tuples of class +1 from all the tuples of class -1.

41
Contd..
Class 1, y=+1, (buys_computer=yes)
A2 Class 2, y=-1, (buys_computer=no)  The 2-D training data are
linearly separable.
 There are an infinite
number of separating lines
that could be drawn. How
can we find this best line?
 If our data were 3-D (i.e.,
with three attributes), we
would want to find the best
separating plane.
A1
 To n dimensions, we want to find the best hyperplane or decision
boundary regardless of the number of input attributes.
 An SVM approaches this problem by searching for the maximum
marginal hyperplane.

42
Contd..
Class 1, y=+1, (buys_computer=yes) Class 1, y=+1, (buys_computer=yes)
Class 2, y=-1, (buys_computer=no) Class 2, y=-1, (buys_computer=no)
A2 A2

Small Margin

A1 A1

43
4.9 Classification Using Frequent Patterns:
Pattern-Based Classification
 Frequent patterns show interesting relationships between attribute–
value pairs that occur frequently in a given data set.
 Frequent pattern mining or frequent itemset mining is the search for
these frequent patterns.
 Analysis is useful in many decision-making processes such as product
placement, catalog design, and cross-marketing.
1) Associative classification: where association rules are generated from
frequent patterns and used for classification.
2) Discriminative frequent pattern–based classification: where frequent
patterns serve as combined features, which are considered in addition to
single features when building a classification model.

44
4.9.1) Associative classification
Associative classification: consists of the following steps:
1. Mine data to find strong associations between frequent patterns
(conjunctions of attribute-value pairs) and class labels
Ex: age = youth
2. Analyze the frequent itemsets to generate association rules per class,
which satisfy confidence and support criteria.
Ex: age=youth ^ credit=OK => buys_computer=yes [support=20%,
confidence=93%], where ^ represents a logical “AND”.
3. Organize the rules to form a rule-based classifier.

Typical Associative Classification Methods:

i. CBA (Classification Based on Associations)
ii. CMAR (Classification based on Multiple Association Rules)
iii. CPAR (Classification based on Predictive Association Rules)

45
1) CBA (Classification Based on Associations)
 CBA (Classification Based on Associations): uses an iterative
approach to frequent itemset mining. It consists of two parts.
i. Rule generator (CBA-RG), mines the possible Classification
Association Rules (CARs), based on Apriori algorithm in the form
of, Cond-set (a set of attribute-value pairs) -> class label
ii. Classifier Builder (CBA-CB) organizes rules according to
decreasing precedence based on confidence and then support.
• R1 has higher confidence than R2
• R1 and R2 have same confidence but R1 has higher support
• R1 and R2 have same confidence and support but R1 is generated first
(i.e. R1 has less items than R2)
 When classifying a new tuple, the first rule satisfying the tuple is
used to classify it.
 CBA is more accurate than C4.5 on large datasets.
 CBA Limitations:
• Single coverage, most effective rule with highest confidence .
• Too many rules, Storage overhead (pruning) and
Computational overhead 46
2) CMAR (Classification based on
Multiple Association Rules)
Instead of relying on a single rule for classification, CMAR determines
the class label by a set of rules.
i. Candidate generation
• CMAR adopts a enhanced FP-growth algorithm (faster than
Apriori used by CBA) to find the complete set of classification
association rules (CARs) satisfying the minimum confidence and
minimum support thresholds.
• To improve both accuracy and efficiency , CMAR employs a novel
data structure, CR-tree, to compactly store and efficiently retrieve a
large number of rules for classification and to prune rules based on
confidence, correlation, and database coverage.
• Pruning mechanisms:
- Precedence relationship
- Positive correlation to class label (Χ2 chi-square)
- Multiple database coverage

47
CMAR Contd..
ii. Classification
• It divides the rules into groups according to class labels. All rules
within a group share the same class label and each group has a
distinct class label.
• CMAR uses a weighted Χ2 (chi-square) measure to find the
strongest” group of rules, based on the statistical correlation of
rules within a group. It then assigns X the class label of the
strongest group.
ADVANTAGES:
• Outperforms C4.5 and CBA on accuracy
• Less storage requirements compared to CBA
• Lower running time compared to CBA
• Accuracy does not depend too much on confidence and coverage
threshold
LIMITATIONS:
• Many rules generated
• Confidence-based rule evaluation thus overfitting 48
3) CPAR (Classification based on
Predictive Association Rules)
iii) CPAR (Classification based on Predictive Association Rules):
• Generation of predictive rules (FOIL-like analysis) but allow
covered rules to retain with reduced weight.
• FOIL (First Order Inductive Learner) builds rules to distinguish
positive tuples (e.g., buys_computer=yes) from negative tuples
(e.g., buys_computer=no). For multiclass problems, FOIL is
applied to each class.
• Prediction using best k rules.
• High efficiency, accuracy similar to CMAR.

49
Contd..
2) Discriminative Frequent Pattern–Based Classification:
1. Feature generation:
• The data, D, are partitioned according to class label.
• Use frequent itemset mining to discover frequent patterns in
each partition, satisfying minimum support.
• The collection of frequent patterns, F, makes up the feature
candidates.
2. Feature selection:
Apply feature selection to F, resulting in FS, the set of selected (more
discriminating) frequent patterns. Information gain, Fisher score,
or other evaluation measures can be used for this step. Relevancy
checking can also be incorporated into this step to weed out
redundant patterns. The data set D is transformed to D0, where
the feature space now includes the single features as well as the
selected frequent patterns, FS.
3. Learning of classification model: A classifier is built on the data
set D0. Any learning algorithm can be used as the classification
model. 50
Contd..

Fig: A framework for frequent pattern–based classification:

(a) a two-step general approach versus
(b) the direct approach of DDPMine.

r22-1-9-ml-lab-manual-r22-regulations
No ratings yet
r22-1-9-ml-lab-manual-r22-regulations
24 pages
UNIT-2 ML notes
No ratings yet
UNIT-2 ML notes
15 pages
Machine Learning Questions
50% (2)
Machine Learning Questions
2 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Machine Learning-4
No ratings yet
Machine Learning-4
18 pages
Rubrics For Vlog: Advanced 10 Pts Proficient 9 Pts Developing 7 Pts Beginning 5 Pts
71% (7)
Rubrics For Vlog: Advanced 10 Pts Proficient 9 Pts Developing 7 Pts Beginning 5 Pts
1 page
Fsd Unit III
No ratings yet
Fsd Unit III
22 pages
CAIT
No ratings yet
CAIT
27 pages
Down 4
No ratings yet
Down 4
83 pages
Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
Module 04
No ratings yet
Module 04
75 pages
Machine Learning
100% (1)
Machine Learning
12 pages
Neural Network Notes Unit 1
100% (1)
Neural Network Notes Unit 1
91 pages
UNIT 5
No ratings yet
UNIT 5
36 pages
Mooc File On Introduce To Machine Learning
No ratings yet
Mooc File On Introduce To Machine Learning
13 pages
DSA5102_lecture9
100% (1)
DSA5102_lecture9
35 pages
Machine Learning With Python Unit 1-17-84 Final13092024
No ratings yet
Machine Learning With Python Unit 1-17-84 Final13092024
68 pages
Key Features of Data Mining
No ratings yet
Key Features of Data Mining
1 page
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
Machine L-Lab-Manual
No ratings yet
Machine L-Lab-Manual
90 pages
Ml Lab Manual (5cs4-23)
No ratings yet
Ml Lab Manual (5cs4-23)
53 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
44 pages
Unit 3
No ratings yet
Unit 3
113 pages
Trig Hand Trick in Solving Special Angles in Trigonometric Ratios
No ratings yet
Trig Hand Trick in Solving Special Angles in Trigonometric Ratios
58 pages
DataMining S
No ratings yet
DataMining S
103 pages
ML QB WITH ANSWER
No ratings yet
ML QB WITH ANSWER
20 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Unit-3 DMDW
No ratings yet
Unit-3 DMDW
36 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Full Notes
No ratings yet
Full Notes
37 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
UE20CS302 Unit4 Slides
No ratings yet
UE20CS302 Unit4 Slides
312 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Text
No ratings yet
Text
131 pages
Research Methodology Chapter 2
No ratings yet
Research Methodology Chapter 2
7 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
30 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Anomaly Detection
No ratings yet
Anomaly Detection
11 pages
Data Mining
No ratings yet
Data Mining
49 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
16 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Research Defense Rubric
No ratings yet
Research Defense Rubric
1 page
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
21.music and Science
No ratings yet
21.music and Science
17 pages
UNIT1
No ratings yet
UNIT1
38 pages
Q4 DLL G9.Health Week 1
100% (3)
Q4 DLL G9.Health Week 1
9 pages
145 Helping Process Group
No ratings yet
145 Helping Process Group
26 pages
Unwrapping The Standards
100% (2)
Unwrapping The Standards
2 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Delirium, Dementia & Amnestic Cognitive Disorders
75% (4)
Delirium, Dementia & Amnestic Cognitive Disorders
33 pages
The Theoretical Framework in Phenomenological Rese
No ratings yet
The Theoretical Framework in Phenomenological Rese
37 pages
Pert 7 - Ethics and Privacy
No ratings yet
Pert 7 - Ethics and Privacy
18 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Core Studies - Canli Et Al 2000
No ratings yet
Core Studies - Canli Et Al 2000
51 pages
02 Handout 3
No ratings yet
02 Handout 3
5 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Observations On Style Difference of Ary and Geo News
No ratings yet
Observations On Style Difference of Ary and Geo News
2 pages
Daily Journal Guide
No ratings yet
Daily Journal Guide
1 page
REVIEWER in Philosophy
No ratings yet
REVIEWER in Philosophy
17 pages
Adjectives: Definition: A Word Belonging To One of The Major Form Classes in Any of Numerous
No ratings yet
Adjectives: Definition: A Word Belonging To One of The Major Form Classes in Any of Numerous
9 pages
Grade 7 DLP Week 1 Lesson 1 Day 1-4
No ratings yet
Grade 7 DLP Week 1 Lesson 1 Day 1-4
19 pages
How To Write A Kolb Paper
0% (1)
How To Write A Kolb Paper
8 pages
Certainty Tasks Hanouts 3as S
No ratings yet
Certainty Tasks Hanouts 3as S
1 page
Business Policy Strategic Management Notes 2011-12-111001044206 Phpapp02
No ratings yet
Business Policy Strategic Management Notes 2011-12-111001044206 Phpapp02
428 pages
Mind-Reading in The 21st Century Digital Versus Paper
No ratings yet
Mind-Reading in The 21st Century Digital Versus Paper
5 pages
Improving Pedagogical Skills in Advanced Mathematics For Grade 1 3 Teachers
No ratings yet
Improving Pedagogical Skills in Advanced Mathematics For Grade 1 3 Teachers
10 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Outline: Problem Statement Definitions & Examples Strategies
No ratings yet
Outline: Problem Statement Definitions & Examples Strategies
7 pages
Sample Direct Instruction Lesson Plan: Mathematics (2
No ratings yet
Sample Direct Instruction Lesson Plan: Mathematics (2
2 pages
James CV
No ratings yet
James CV
2 pages
15 Group Behavior
No ratings yet
15 Group Behavior
29 pages
Thematic Unit Matrix Updated
No ratings yet
Thematic Unit Matrix Updated
3 pages
Lesson Plan: Aims
No ratings yet
Lesson Plan: Aims
7 pages
SOP General
No ratings yet
SOP General
4 pages
Association Rules
No ratings yet
Association Rules
64 pages
Chapter 10: Algorithms 10.1. Deterministic and Non-Deterministic Algorithm
No ratings yet
Chapter 10: Algorithms 10.1. Deterministic and Non-Deterministic Algorithm
5 pages
New HRCS 8 Competency Model Focuses On Simplifying Complexity
No ratings yet
New HRCS 8 Competency Model Focuses On Simplifying Complexity
3 pages

ABP DWDM UNIT 4 Classification 1

Uploaded by

ABP DWDM UNIT 4 Classification 1

Uploaded by

B.

VARDHAMAN COLLEGE OF ENGINEERING

 credit approval-“safe” or “risky”

NAME RANK YEARS TENURED Classifier

 Ex(2): Classification for faculty to predict tenured (permanent post).

youth middle senior

student? yes credit rating?

no yes excellent fair

 where pi is the probability that a tuple in D belongs to class Ci and is

X: 35 years old customer with an

 so that P(xk|Ci)= g(xk, μ Ci , σ Ci ).

Table: Confusion matrix for the

Fig-Increasing classifier accuracy: Ensemble methods generate a set of

 Because random forests consider

Fig: Multilayer feed-forward neural network

Typical Associative Classification Methods:

Fig: A framework for frequent pattern–based classification:

You might also like