0% found this document useful (0 votes)
25 views19 pages

21AI71 SIMP TIE (1)_250107_124440

Uploaded by

amatulwafi5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views19 pages

21AI71 SIMP TIE (1)_250107_124440

Uploaded by

amatulwafi5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

1

21AI71 AAIML TIE SIMP Questions with Key Answers- 2025


by the TIE KSIT, DSATM, JIT Review team

Module 1: Intelligent Agents

1. Define PEAS and explain di erent agent types with their PEAS descriptions.

2. List the four basic types of agent programs in any intelligent system and explain
how to convert them into learning agents.

3. Illustrate the minimax algorithm and explain it with an example.

4. Write and explain the Alpha-Beta algorithm with an example.

5. What are the key properties and characteristics of an intelligent agent?

6. Explain the concept of rationality in intelligent agents and factors contributing to


rational behavior.

7. Discuss di erent types of environments that agents can interact with (e.g., fully
observable vs. partially observable, deterministic vs. stochastic).

8. Describe the main components and architectures of intelligent agents (e.g.,


simple reflex agents, model-based agents, goal-based agents).

Module 2: Uncertain Knowledge and Reasoning

1. Explain conditional independence in the Wumpus World revisited problem with a


diagram.

2. Discuss Bayes' Rule and its application in a simple case.

3. Explain the probability axioms and their reasonableness with an example.

4. Explain inference with full joint distribution and provide an example.

5. How is uncertainty quantified in AI systems? What are some common methods


for representing uncertainty?

6. Discuss strategies for making decisions under uncertainty, such as expected


value and minimax.

7. Explain the concept of conditional probability and its importance in reasoning


under uncertainty.

8. Provide a real-world example of Bayes' rule for updating beliefs based on new
evidence.

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


2

Module 3: Neural Networks and Genetic Algorithms: Any 7 questions

1. Explain the perceptron in ANN with a diagram.

2. Construct the gradient descent algorithm for training a linear unit.

3. Discuss the stochastic gradient descent version of the backpropagation


algorithm for feed-forward networks containing two layers of sigmoid units.

4. Implement the ANDNOT function using a McCulloch-Pitts neuron (use binary


data representation).

5. Discuss the prototypical genetic algorithm in detail.

6. Define genetic programming and discuss representing programs in genetic


programming.

7. What are the limitations of single-layer perceptrons, and how do multilayer


networks overcome them?

8. Explain the role of activation functions in neural networks, providing examples of


their properties.

9. Discuss the concept of hypothesis space search and its relevance to genetic
algorithms.

10. Compare models of evolution and learning in genetic algorithms.

Module 4: Recommender Systems and Text Analytics: Any 7 questions

1. Identify the metrics used to generate association rules with an example.

2. Discuss item-based similarity in collaborative filtering with an example.

3. Explain user-based similarity using the Surprise library and provide a snippet of
code.

4. Explain matrix factorization with an example.

5. Explain the Bag-of-Words (BoW) model in text analysis.

6. Discuss the Naive-Bayes model for sentiment classification in text analysis.

7. List and explain the challenges of text analytics.

8. Explain the TF-IDF vectorizer in text analysis.

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


3

9. What are the critical steps in building a recommender system, and what
datasets are commonly used?

10. Provide an overview of text analytics and its applications in AI.

Module 5: Clustering and Instance-Based Learning: Any 7 questions

1. List and explain the classification of clustering algorithms.

2. Explain the di erent steps of the K-Means algorithm with an example.

3. Draw the flowchart of the agglomerative algorithm and explain it in detail.

4. Explain the k-nearest neighbor algorithm for approximating a discrete-valued


function.

5. Derive a locally weighted linear regression with an example.

6. Construct a radial basis function network with weights.

7. How do partitioning methods (e.g., k-means, k-medoids) di er from hierarchical


clustering techniques?

8. Discuss the use of k-nearest neighbor (k-NN) for both classification and
regression tasks.

9. How does locally weighted regression adapt to changes in input data?

10. Compare and contrast radial basis function networks with traditional neural
networks.

Module – 01 Key Answers

1. PEAS and Agent Types:

 PEAS stands for Performance measure, Environment, Actuators, and


Sensors. It's a way to describe the setting for an agent.

o Performance Measure: What criteria define success for the agent?

o Environment: Where does the agent operate?

o Actuators: How does the agent a ect the environment?

o Sensors: How does the agent perceive the environment?

 Example Agent Types with PEAS:

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


4

o Medical Diagnosis System:

 Performance Measure: Correct diagnosis, minimizing patient


discomfort.

 Environment: Patient, hospital records, medical literature.

 Actuators: Displaying diagnoses, recommending tests/treatments.

 Sensors: Patient symptoms, test results, medical history.

o Robot Soccer Player:

 Performance Measure: Winning the game, scoring goals.

 Environment: Soccer field, other players, ball.

 Actuators: Wheels/legs, kicking mechanism.

 Sensors: Cameras, position sensors.

2. Agent Program Types and Learning:

 Four basic agent program types:

o Simple Reflex Agents: React directly to percepts (current sensory input).


If condition, then action.

o Model-Based Reflex Agents: Maintain internal state (a "model" of the


world) to handle partially observable environments.

o Goal-Based Agents: Have goals they try to achieve. They use search and
planning to find sequences of actions.

o Utility-Based Agents: Consider multiple goals and their associated


"utility" (how desirable they are). They choose actions that maximize
expected utility.

 Turning them into Learning Agents: Add a "learning element" that allows the
agent to improve its performance over time. This can involve:

o Learning from experience: Observing the consequences of its actions.

o Learning from feedback: Receiving explicit feedback on its performance.

o Using machine learning algorithms: Such as reinforcement learning,


supervised learning, etc.

3. Minimax Algorithm:

 Used for two-player games (like tic-tac-toe, chess). It assumes both players play
optimally.

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


5

 It explores the game tree, alternating between maximizing player (agent) and
minimizing player (opponent).

 Example (simplified): Imagine a simple game with only two moves each. The
agent wants to maximize its score. The opponent wants to minimize it. The
minimax algorithm explores all possible outcomes to find the best move for the
agent, assuming the opponent will also make the best possible move for itself.

4. Alpha-Beta Pruning:

 An optimization of minimax. It avoids exploring branches of the game tree that


are guaranteed to be worse than already found solutions.

 Alpha: The best (highest) value found so far for the maximizing player.

 Beta: The best (lowest) value found so far for the minimizing player.

 It "prunes" branches by stopping exploration when a move is found that is worse


than the current alpha or beta value. This significantly speeds up the search.

5. Key Properties of Intelligent Agents:

 Autonomy: Ability to operate without constant human guidance.

 Reactivity: Ability to perceive and respond to changes in the environment.

 Pro-activeness: Ability to take initiative and pursue goals.

 Social ability: Ability to communicate and cooperate with other agents (or
humans).

6. Rationality:

 A rational agent does the "right thing," given what it knows.

 Factors contributing to rational behavior:

o Clear performance measure.

o Knowledge of the environment.

o Available actions.

o Computational resources.

7. Types of Environments:

 Fully Observable vs. Partially Observable: Can the agent see the complete
state of the environment?

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


6

 Deterministic vs. Stochastic: Is the next state of the environment completely


determined by the current state1 and the agent's action (deterministic), or is
there some randomness involved (stochastic)?

 Episodic vs. Sequential: Are the agent's experiences divided into independent
episodes (episodic), or does the current action a ect future experiences
(sequential)?

 Static vs. Dynamic: Does the environment change while the agent is
deliberating?

 Discrete vs. Continuous: Are the number of possible states and actions finite
(discrete) or infinite (continuous)?

 Single-agent vs. Multi-agent: Is the agent operating alone or with other agents?

8. Agent Architectures:

 Simple Reflex Agents: Simplest type. React directly to percepts.

 Model-Based Reflex Agents: Use an internal model of the world to handle


partial observability.

 Goal-Based Agents: Use goals to guide their actions.

 Utility-Based Agents: Choose actions that maximize expected utility (a measure


of desirability).

Module – 02 Key Answers

1. Conditional Independence in Wumpus World (Revisited):

In the Wumpus World, the presence of a breeze in a square is directly caused by a pit in
an adjacent square. However, if we know the state of the adjacent squares (whether
they have pits or not), then the breeze in the current square becomes independent of
the state of other non-adjacent squares.

Diagram:

+---+---+---+

| |B| |

+---+---+---+

|P|W|P|

+---+---+---+

| |B| |

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


7

+---+---+---+

P = Potential Pit

B = Breeze

W = Current Square

If we know there are pits in the adjacent squares marked 'P', then the breeze in 'W' is
fully explained. The presence or absence of pits in other squares on the board becomes
irrelevant to the breeze in 'W'. This is conditional independence: Breeze in W is
conditionally independent of other pits given the state of adjacent squares.

2.Bayes' Rule and its Application:

Bayes' Rule relates conditional probabilities: 1

P(A|B) = [P(B|A) * P(A)] / P(B)

 P(A|B): Probability of A given B (posterior).2

 P(B|A): Probability of B given A (likelihood).

 P(A): Prior probability of A.

 P(B): Prior probability of B.

Simple Case:

Suppose a disease a ects 1% of the population (P(Disease) = 0.01).3 A test for the
disease has a 95% true positive rate (P(Positive|Disease) = 0.95) and a 5% false positive
rate (P(Positive|¬Disease) = 0.05). If someone tests positive, what's the probability they
actually have the disease (P(Disease|Positive))?

P(Disease|Positive) = [P(Positive|Disease) * P(Disease)] / P(Positive)

We need P(Positive), which can be calculated using the law of total probability:

P(Positive) = P(Positive|Disease)P(Disease) + P(Positive|¬Disease)P(¬Disease)


P(Positive) = (0.95 * 0.01) + (0.05 * 0.99) = 0.059

Now, applying Bayes' Rule:

P(Disease|Positive) = (0.95 * 0.01) / 0.059 ≈ 0.161

So, even with a positive test, there's only about a 16.1% chance of actually having the
disease.

3. Probability Axioms and Reasonableness:

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


8

 Non-negativity: P(A) ≥ 0.4 The probability of any event is non-negative.5


Reasonable because probabilities represent proportions or chances, which can't
be negative.6

 Normalization: P(Sample Space) = 1. The probability of the entire sample space


(all possible outcomes) is 1. Reasonable because one of the possible outcomes
must occur.

 Additivity: If A and B are mutually exclusive (cannot both occur), then P(A ∪ B) =
P(A) + P(B). Reasonable because if events can't happen together, their
probabilities simply add up.

Example: Rolling a fair die.

 P(rolling a 3) = 1/6 ≥ 0 (Non-negativity)

 P(rolling 1, 2, 3, 4, 5, or 6) = 1 (Normalization)

 P(rolling a 2 or a 3) = P(rolling a 2) + P(rolling a 3) = 1/6 + 1/6 = 1/3 (Additivity)

4.Inference with Full Joint Distribution:

A full joint distribution gives the probability of every possible combination of values for
all variables. We can use it to infer the probability of any event.

Example: Two binary variables: Rain (R) and Sprinkler (S).

R S P(R, S)

True True 0.01

True False 0.09

False True 0.10

False False 0.80

To find P(R), we sum the probabilities where R is true: P(R) = P(R, S) + P(R, ¬S) = 0.01 +
0.09 = 0.10.

5.Quantifying Uncertainty in AI:

Uncertainty is quantified using probability theory. Common methods:

 Probability distributions: Assign probabilities to di erent outcomes.7

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


9

 Bayesian networks: Represent probabilistic relationships between variables.8

 Fuzzy logic: Deals with degrees of truth rather than absolute true/false.9

6. Strategies for Decision Making Under Uncertainty:

 Expected Value: Calculate the average outcome by weighting each possible


outcome by its probability. Choose the action with the highest expected value.

 Minimax (revisited): In game theory, minimize the maximum possible loss.10

7. Conditional Probability:

P(A|B) is the probability of A given that B has occurred.11 It's crucial for reasoning under
uncertainty because it allows us to update our beliefs based on new evidence.

8.Real-World Example of Bayes' Rule:

Spam filtering:

 A word like "free" appears in many spam emails but also in some legitimate
emails.

 P(Spam): Prior probability of an email being spam (e.g., based on past


experience).

 P("free"|Spam): Probability of "free" appearing in spam. 12

 P("free"|¬Spam): Probability of "free" appearing in legitimate emails.

When an email arrives with the word "free," Bayes' rule is used to update the belief that
the email is spam:

P(Spam|"free") = [P("free"|Spam) * P(Spam)] / P("free")

This updated probability helps the filter decide whether to classify the email as spam.

Module – 03 Key Answers

1. Explain the perceptron in ANN with a diagram.

A perceptron is the simplest form of a neural network. It takes several binary inputs,
weights them, sums them, and applies an activation function to produce a binary
output.

x1 ---w1---> (+) --- Activation Function ---> Output (y)

x2 ---w2---> |

... |Σ

xn ---wn---> |

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


10

bias (b)----->|

 x1, x2, ..., xn: Input values.

 w1, w2, ..., wn: Weights associated with each input.

 bias (b): A constant value added to the weighted sum.

 Σ: Summation of weighted inputs and bias.

 Activation Function: Typically a step function (if sum > threshold, output 1;
otherwise, 0).

2. Construct the gradient descent algorithm for training a linear unit.

For a linear unit (where the activation function is simply the identity function f(x) = x), the
gradient descent algorithm aims to minimize the error between the predicted output
and the actual output.

1. Initialize weights: Assign random initial values to weights.

2. For each training example:

o Calculate the predicted output: y_hat = w1*x1 + w2*x2 + ... + wn*xn + b

o Calculate the error: error = y - y_hat (where y is the actual output)

o Update weights: wi = wi + learning_rate * error * xi

o Update bias: b = b + learning_rate * error

The learning rate controls the step size of the weight updates.

3. Discuss the stochastic gradient descent version of the backpropagation


algorithm for feed-forward networks containing two layers of sigmoid units. 1

Backpropagation is used to train multilayer networks. Stochastic gradient descent


updates weights after each training example.

1. Forward Pass: Input is propagated through the network to calculate output.

2. Error Calculation: The error between the network's output and the target output
is calculated.

3. Backward Pass: The error is propagated backward through the network to


calculate the gradient of the error with respect to each weight.

4. Weight Update: Weights are updated proportionally to the negative gradient.

For a two-layer network with sigmoid units, the weight updates involve the derivative of
the sigmoid function. The process is mathematically involved but conceptually
calculates how much each weight contributed to the error and adjusts it accordingly.

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


11

4. Implement the ANDNOT function using a McCulloch-Pitts neuron (use binary


data representation).

The ANDNOT function (x1 AND NOT x2) can be implemented as follows:

 Inputs: x1, x2 (binary)

 Weights: w1 = 1, w2 = -1

 Threshold: 1

If x1 = 1 and x2 = 0, the weighted sum is (1*1) + (-1*0) = 1, which is greater than or equal
to the threshold, so the output is 1. If x1 = 1 and x2 = 1, the weighted sum is (1*1) + (-1*1)
= 0, which is less than the threshold, so the output is 0.

5. Discuss the prototypical genetic algorithm in detail.

1. Initialization: Create a population of candidate solutions (chromosomes).

2. Evaluation: Evaluate the fitness of each chromosome.

3. Selection: Select chromosomes for reproduction based on their fitness (e.g.,


roulette wheel selection, tournament selection).

4. Crossover (Recombination): Combine parts of two parent chromosomes to


create o spring.

5. Mutation: Introduce small random changes to the o spring.

6. Replacement: Replace the old population with the new o spring.

7. Repeat steps 2-6 until a satisfactory solution is found or a termination criterion


is met.

6. Define genetic programming and discuss representing programs in genetic


programming.

Genetic programming (GP) evolves computer programs. Programs are typically


represented as tree structures (parse trees), where:

 Internal nodes represent functions (+, -, *, /, IF, etc.).

 Leaf nodes represent terminals (variables, constants).

Example: The expression (x + 3) * y would be represented as:

/\

+ y

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


12

/\

x 3

7. What are the limitations of single-layer perceptrons, and how do multilayer


networks overcome them?

Single-layer perceptrons can only learn linearly separable functions (functions where
the data can be separated by a straight line or hyperplane). They cannot learn functions
like XOR.

Multilayer networks overcome this limitation by introducing hidden layers, which allow
them to learn more complex, non-linear functions.

8. Explain the role of activation functions in neural networks, providing examples of


their properties.

Activation functions introduce non-linearity to neural networks, allowing them to learn


complex patterns.

 Sigmoid: Outputs a value between 0 and 1. Used for probabilities.

 ReLU (Rectified Linear Unit): Outputs 0 if input is negative, otherwise outputs


the input directly. Popular due to its e iciency.

 Tanh (Hyperbolic Tangent): Outputs a value between -1 and 1.

9. Discuss the concept of hypothesis space search and its relevance to genetic
algorithms.

Genetic algorithms perform a search through the hypothesis space of possible


solutions. Each chromosome represents a point in this space. The GA explores the
space by iteratively generating new points (o spring) and evaluating their fitness.

10. Compare models of evolution and learning in genetic algorithms.

 Evolution: The GA simulates natural selection, where fitter individuals are more
likely to reproduce and pass on their traits.

 Learning: The GA learns by iteratively improving the population through


crossover and mutation, exploring di erent regions of the hypothesis space and
converging towards better solutions. The population itself learns over time by
accumulating beneficial genes/traits.

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


13

Module – 04 Key Answers


1. Metrics for Association Rule Generation:

Association rules discover relationships between items in a dataset. Key metrics


include:

 Support: The proportion of transactions containing both items A and B.


Support(A → B) = P(A ∩ B)

 Confidence: The proportion of transactions containing A that also contain B.


Confidence(A → B) = P(B|A) = P(A ∩ B) / P(A)

 Lift: How much more likely B is to be purchased when A is purchased, compared


to the general probability of purchasing B. Lift(A → B) = P(B|A) / P(B) =
Confidence(A → B) / Support(B)

Example:

Transactions:

1. {Milk, Bread, Butter}

2. {Milk, Butter}

3. {Milk, Diapers, Beer}

4. {Bread, Butter}

5. {Milk, Bread, Diapers}

Rule: {Milk, Bread} → {Butter}

 Support: 2/5 (Transactions 1 and 5 contain all three)

 Confidence: 2/3 (Transactions 1, 4, and 5 contain Milk and Bread; 2 of those also
contain Butter)

 Lift: (2/3) / (3/5) = 10/9 (Butter appears in 3/5 of all transactions)

2. Item-Based Similarity in Collaborative Filtering:

Item-based CF recommends items similar to those a user has liked in the past.
Similarity is calculated between items based on user ratings.

Example:

User Ratings (1-5 scale):

 User A: Movie 1 (4), Movie 2 (5), Movie 3 (1)

 User B: Movie 1 (5), Movie 2 (4), Movie 3 (2)

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


14

 User C: Movie 1 (1), Movie 3 (4)

Movies 1 and 2 have similar ratings from users A and B. If User C liked Movie 3, item-
based CF might recommend Movie 1 or 2 because other users who liked Movie 3 also
tended to like Movies 1 and 2. Common similarity measures include cosine similarity
and adjusted cosine similarity.

3. User-Based Similarity with Surprise:

User-based CF recommends items liked by similar users.

4. Matrix Factorization:

Matrix factorization decomposes a user-item interaction matrix into two lower-


dimensional matrices representing user and item latent factors.

Example:

User-Movie Rating Matrix:

Movie 1 Movie 2 Movie 3

User A 4 5 1

User B 5 4 2

User C 1 ? 4

Matrix factorization would approximate this matrix by two smaller matrices: a user-
factor matrix and an item-factor matrix. Multiplying these matrices would reconstruct
an approximation of the original matrix, filling in the missing rating for User C and Movie
2.

5. Bag-of-Words (BoW):

BoW represents text as a collection of its words, disregarding grammar and word order.
A document is represented as a vector where each dimension corresponds to a unique
word in the corpus, and the value is the frequency of that word in the document.

Example:

 Document 1: "the cat sat on the mat"

 Document 2: "the dog sat on the log"

Vocabulary: {"the", "cat", "sat", "on", "mat", "dog", "log"}

BoW representation:

 Document 1: {2, 1, 1, 1, 1, 0, 0}

 Document 2: {2, 0, 1, 1, 0, 1, 1}

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


15

6. Naive Bayes for Sentiment Classification:

Naive Bayes uses Bayes' theorem to classify text based on word frequencies. It
assumes that words are conditionally independent given the sentiment.

Example:

Training data:

 Positive: "This movie is great!"

 Negative: "This movie is terrible."

A new sentence: "This movie is good."

Naive Bayes calculates the probability of the sentence being positive and negative
based on the frequencies of words like "great," "terrible," and "good" in the training data.

7. Challenges of Text Analytics:

 Ambiguity: Words can have multiple meanings.

 Sarcasm and Irony: Di icult to detect automatically.

 Context: Word meaning depends on context.

 Spelling variations and errors: Can a ect analysis.

 Data sparsity: Many words occur infrequently.

8. TF-IDF Vectorizer:

TF-IDF (Term Frequency-Inverse Document Frequency) weighs words based on their


importance in a document relative to the entire corpus.

 TF (Term Frequency): How often a word appears in a document.

 IDF (Inverse Document Frequency): How rare a word is across all documents.

TF-IDF increases for words that are frequent in a document but rare in the corpus.

9. Steps in Building a Recommender System:

1. Data Collection: Gather user-item interaction data (ratings, clicks, purchases).

2. Data Preprocessing: Clean and format the data.

3. Model Selection: Choose an appropriate recommendation algorithm (CF,


content-based, hybrid).

4. Training and Evaluation: Train the model and evaluate its performance using
metrics like precision, recall, RMSE.

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


16

5. Deployment: Integrate the recommender system into an application.

Common datasets: MovieLens, Netflix Prize, Amazon reviews.

10. Overview of Text Analytics and its Applications:

Text analytics extracts meaningful information from unstructured text data.

Applications in AI:

 Sentiment analysis: Understanding customer opinions.

 Topic modeling: Discovering topics in large text collections.

 Chatbots: Building conversational AI agents.

 Information retrieval: Improving search engine accuracy.

 Text summarization: Automatically generating summaries of documents.

Module – 05 Key Answers


.

1. Classification of Clustering Algorithms:

Clustering algorithms group similar data points together. They can be classified as:

 Partitioning Methods: Divide data into non-overlapping clusters (e.g., k-means,


k-medoids).

 Hierarchical Methods: Create a hierarchy of clusters (e.g., agglomerative,


divisive).

 Density-Based Methods: Identify clusters based on the density of data points


(e.g., DBSCAN).

 Grid-Based Methods: Quantize the data space into a grid structure (e.g.,
STING).

 Model-Based Methods: Assume data is generated from a mixture of probability


distributions (e.g., Gaussian Mixture Models).

2. Steps of the K-Means Algorithm with Example:

K-means partitions data into k clusters.

1. Initialization: Choose k initial centroids (cluster centers).

2. Assignment: Assign each data point to the nearest centroid. 3.1 Update:
Recalculate the centroids as the mean of the points assigned to each cluster.

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


17

3. Repeat steps 2-3 until convergence (centroids no longer change significantly).

Example:

Data points: (1,1), (1,2), (3,5), (4,4), (5,5) and k=2

1. Initial centroids: (1,1), (5,5)

2. Assignment:

o (1,1), (1,2) assigned to (1,1)

o (3,5), (4,4), (5,5) assigned to (5,5)

3. Update:

o New centroid 1: (1, 1.5)

o New centroid 2: (4, 4.67)

4. Repeat until convergence.

3. Flowchart of Agglomerative Algorithm:

[Start] --> [Each data point is a cluster]

[Find the two closest clusters]

[Merge the two clusters]

[Are there more than one cluster?] -- Yes --> [Find the two closest clusters]

| No

[End]

Agglomerative clustering starts with each data point as its own cluster and repeatedly
merges the closest clusters until only one cluster remains. Distance metrics (e.g., single
linkage, complete linkage, average linkage) determine cluster closeness.

4. K-Nearest Neighbor (k-NN) for Discrete-Valued Functions:

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


18

k-NN classifies a new instance by finding the k nearest training instances and assigning
the most frequent class among them.

1. Store training examples.

2. Given a new instance:

o Find the k nearest training examples based on a distance metric (e.g.,


Euclidean distance).

o Count the number of examples belonging to each class among the k


neighbors.

o Assign the new instance to the most frequent class.

5. Locally Weighted Linear Regression:

Locally weighted linear regression fits a linear model to the neighborhood of a query
point. It assigns weights to training examples based on their distance from the query
point.

 Weighting function: w(i) = exp(-dist(x, xi)^2 / (2*tau^2)) where tau is the


bandwidth parameter.

 Minimize the weighted squared error: Σ w(i)*(y(i) - w*x(i))^2

Example: Predicting house prices. When predicting the price of a house with 3
bedrooms, we give more weight to training houses with a similar number of bedrooms.

6. Radial Basis Function (RBF) Network with Weights:

An RBF network has three layers: input, hidden (RBF units), and output.

 Hidden units use RBFs (e.g., Gaussian functions) as activation functions.

 The output is a weighted sum of the hidden unit outputs.

Input Layer --> RBF Layer (Gaussian functions) --> Output Layer (weighted sum)

x1 ---|--- RBF1(x) ---w1---> (+) ---> Output

x2 ---|--- RBF2(x) ---w2---> |

... ---|--- ... ---...---> |

Each RBF unit has a center and a width. The output of an RBF unit is high when the input
is close to its center.

7. Partitioning vs. Hierarchical Clustering:

 Partitioning: Divides data into k clusters directly. Computationally e icient for


large datasets. Requires specifying k beforehand.

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)


19

 Hierarchical: Creates a hierarchy of clusters. Useful for visualizing relationships


between clusters. Can be computationally expensive for large datasets. Doesn't
require specifying k beforehand.

8. k-NN for Classification and Regression:

 Classification: Assigns the most frequent class among the k nearest neighbors.

 Regression: Predicts the average or weighted average of the target values of the
k nearest neighbors.

9. Locally Weighted Regression Adaptation:

Locally weighted regression adapts to changes in input data by fitting a di erent linear
model for each query point. The bandwidth parameter (tau) controls the size of the
neighborhood. A smaller tau makes the model more sensitive to local variations.

10. RBF Networks vs. Traditional Neural Networks:

 RBF Networks: Use radial basis functions in the hidden layer. Have a faster
training time due to local receptive fields of RBF units. Good for function
approximation.

 Traditional Neural Networks (e.g., Multilayer Perceptrons): Use sigmoid or


other activation functions. Can learn more complex patterns but may require
more training time.

Key di erences:

 Activation functions: RBFs are local, while sigmoid/tanh are global.

 Number of layers: RBF networks often have a single hidden layer, while
traditional networks can have multiple.

 Training: RBF networks can be trained faster.

TAKEITEASY ENGINEERS’ EDUCATIONAL SERVICES- REVIEW TEAM (TIEES)

You might also like