21AI71 SIMP TIE (1)_250107_124440
21AI71 SIMP TIE (1)_250107_124440
1. Define PEAS and explain di erent agent types with their PEAS descriptions.
2. List the four basic types of agent programs in any intelligent system and explain
how to convert them into learning agents.
7. Discuss di erent types of environments that agents can interact with (e.g., fully
observable vs. partially observable, deterministic vs. stochastic).
8. Provide a real-world example of Bayes' rule for updating beliefs based on new
evidence.
9. Discuss the concept of hypothesis space search and its relevance to genetic
algorithms.
3. Explain user-based similarity using the Surprise library and provide a snippet of
code.
9. What are the critical steps in building a recommender system, and what
datasets are commonly used?
8. Discuss the use of k-nearest neighbor (k-NN) for both classification and
regression tasks.
10. Compare and contrast radial basis function networks with traditional neural
networks.
o Goal-Based Agents: Have goals they try to achieve. They use search and
planning to find sequences of actions.
Turning them into Learning Agents: Add a "learning element" that allows the
agent to improve its performance over time. This can involve:
3. Minimax Algorithm:
Used for two-player games (like tic-tac-toe, chess). It assumes both players play
optimally.
It explores the game tree, alternating between maximizing player (agent) and
minimizing player (opponent).
Example (simplified): Imagine a simple game with only two moves each. The
agent wants to maximize its score. The opponent wants to minimize it. The
minimax algorithm explores all possible outcomes to find the best move for the
agent, assuming the opponent will also make the best possible move for itself.
4. Alpha-Beta Pruning:
Alpha: The best (highest) value found so far for the maximizing player.
Beta: The best (lowest) value found so far for the minimizing player.
Social ability: Ability to communicate and cooperate with other agents (or
humans).
6. Rationality:
o Available actions.
o Computational resources.
7. Types of Environments:
Fully Observable vs. Partially Observable: Can the agent see the complete
state of the environment?
Episodic vs. Sequential: Are the agent's experiences divided into independent
episodes (episodic), or does the current action a ect future experiences
(sequential)?
Static vs. Dynamic: Does the environment change while the agent is
deliberating?
Discrete vs. Continuous: Are the number of possible states and actions finite
(discrete) or infinite (continuous)?
Single-agent vs. Multi-agent: Is the agent operating alone or with other agents?
8. Agent Architectures:
In the Wumpus World, the presence of a breeze in a square is directly caused by a pit in
an adjacent square. However, if we know the state of the adjacent squares (whether
they have pits or not), then the breeze in the current square becomes independent of
the state of other non-adjacent squares.
Diagram:
+---+---+---+
| |B| |
+---+---+---+
|P|W|P|
+---+---+---+
| |B| |
+---+---+---+
P = Potential Pit
B = Breeze
W = Current Square
If we know there are pits in the adjacent squares marked 'P', then the breeze in 'W' is
fully explained. The presence or absence of pits in other squares on the board becomes
irrelevant to the breeze in 'W'. This is conditional independence: Breeze in W is
conditionally independent of other pits given the state of adjacent squares.
Simple Case:
Suppose a disease a ects 1% of the population (P(Disease) = 0.01).3 A test for the
disease has a 95% true positive rate (P(Positive|Disease) = 0.95) and a 5% false positive
rate (P(Positive|¬Disease) = 0.05). If someone tests positive, what's the probability they
actually have the disease (P(Disease|Positive))?
We need P(Positive), which can be calculated using the law of total probability:
So, even with a positive test, there's only about a 16.1% chance of actually having the
disease.
Additivity: If A and B are mutually exclusive (cannot both occur), then P(A ∪ B) =
P(A) + P(B). Reasonable because if events can't happen together, their
probabilities simply add up.
P(rolling 1, 2, 3, 4, 5, or 6) = 1 (Normalization)
A full joint distribution gives the probability of every possible combination of values for
all variables. We can use it to infer the probability of any event.
R S P(R, S)
To find P(R), we sum the probabilities where R is true: P(R) = P(R, S) + P(R, ¬S) = 0.01 +
0.09 = 0.10.
Fuzzy logic: Deals with degrees of truth rather than absolute true/false.9
7. Conditional Probability:
P(A|B) is the probability of A given that B has occurred.11 It's crucial for reasoning under
uncertainty because it allows us to update our beliefs based on new evidence.
Spam filtering:
A word like "free" appears in many spam emails but also in some legitimate
emails.
When an email arrives with the word "free," Bayes' rule is used to update the belief that
the email is spam:
This updated probability helps the filter decide whether to classify the email as spam.
A perceptron is the simplest form of a neural network. It takes several binary inputs,
weights them, sums them, and applies an activation function to produce a binary
output.
x2 ---w2---> |
... |Σ
xn ---wn---> |
bias (b)----->|
Activation Function: Typically a step function (if sum > threshold, output 1;
otherwise, 0).
For a linear unit (where the activation function is simply the identity function f(x) = x), the
gradient descent algorithm aims to minimize the error between the predicted output
and the actual output.
The learning rate controls the step size of the weight updates.
2. Error Calculation: The error between the network's output and the target output
is calculated.
For a two-layer network with sigmoid units, the weight updates involve the derivative of
the sigmoid function. The process is mathematically involved but conceptually
calculates how much each weight contributed to the error and adjusts it accordingly.
The ANDNOT function (x1 AND NOT x2) can be implemented as follows:
Weights: w1 = 1, w2 = -1
Threshold: 1
If x1 = 1 and x2 = 0, the weighted sum is (1*1) + (-1*0) = 1, which is greater than or equal
to the threshold, so the output is 1. If x1 = 1 and x2 = 1, the weighted sum is (1*1) + (-1*1)
= 0, which is less than the threshold, so the output is 0.
/\
+ y
/\
x 3
Single-layer perceptrons can only learn linearly separable functions (functions where
the data can be separated by a straight line or hyperplane). They cannot learn functions
like XOR.
Multilayer networks overcome this limitation by introducing hidden layers, which allow
them to learn more complex, non-linear functions.
9. Discuss the concept of hypothesis space search and its relevance to genetic
algorithms.
Evolution: The GA simulates natural selection, where fitter individuals are more
likely to reproduce and pass on their traits.
Example:
Transactions:
2. {Milk, Butter}
4. {Bread, Butter}
Confidence: 2/3 (Transactions 1, 4, and 5 contain Milk and Bread; 2 of those also
contain Butter)
Item-based CF recommends items similar to those a user has liked in the past.
Similarity is calculated between items based on user ratings.
Example:
Movies 1 and 2 have similar ratings from users A and B. If User C liked Movie 3, item-
based CF might recommend Movie 1 or 2 because other users who liked Movie 3 also
tended to like Movies 1 and 2. Common similarity measures include cosine similarity
and adjusted cosine similarity.
4. Matrix Factorization:
Example:
User A 4 5 1
User B 5 4 2
User C 1 ? 4
Matrix factorization would approximate this matrix by two smaller matrices: a user-
factor matrix and an item-factor matrix. Multiplying these matrices would reconstruct
an approximation of the original matrix, filling in the missing rating for User C and Movie
2.
5. Bag-of-Words (BoW):
BoW represents text as a collection of its words, disregarding grammar and word order.
A document is represented as a vector where each dimension corresponds to a unique
word in the corpus, and the value is the frequency of that word in the document.
Example:
BoW representation:
Document 1: {2, 1, 1, 1, 1, 0, 0}
Document 2: {2, 0, 1, 1, 0, 1, 1}
Naive Bayes uses Bayes' theorem to classify text based on word frequencies. It
assumes that words are conditionally independent given the sentiment.
Example:
Training data:
Naive Bayes calculates the probability of the sentence being positive and negative
based on the frequencies of words like "great," "terrible," and "good" in the training data.
8. TF-IDF Vectorizer:
IDF (Inverse Document Frequency): How rare a word is across all documents.
TF-IDF increases for words that are frequent in a document but rare in the corpus.
4. Training and Evaluation: Train the model and evaluate its performance using
metrics like precision, recall, RMSE.
Applications in AI:
Clustering algorithms group similar data points together. They can be classified as:
Grid-Based Methods: Quantize the data space into a grid structure (e.g.,
STING).
2. Assignment: Assign each data point to the nearest centroid. 3.1 Update:
Recalculate the centroids as the mean of the points assigned to each cluster.
Example:
2. Assignment:
3. Update:
[Are there more than one cluster?] -- Yes --> [Find the two closest clusters]
| No
[End]
Agglomerative clustering starts with each data point as its own cluster and repeatedly
merges the closest clusters until only one cluster remains. Distance metrics (e.g., single
linkage, complete linkage, average linkage) determine cluster closeness.
k-NN classifies a new instance by finding the k nearest training instances and assigning
the most frequent class among them.
Locally weighted linear regression fits a linear model to the neighborhood of a query
point. It assigns weights to training examples based on their distance from the query
point.
Example: Predicting house prices. When predicting the price of a house with 3
bedrooms, we give more weight to training houses with a similar number of bedrooms.
An RBF network has three layers: input, hidden (RBF units), and output.
Input Layer --> RBF Layer (Gaussian functions) --> Output Layer (weighted sum)
Each RBF unit has a center and a width. The output of an RBF unit is high when the input
is close to its center.
Classification: Assigns the most frequent class among the k nearest neighbors.
Regression: Predicts the average or weighted average of the target values of the
k nearest neighbors.
Locally weighted regression adapts to changes in input data by fitting a di erent linear
model for each query point. The bandwidth parameter (tau) controls the size of the
neighborhood. A smaller tau makes the model more sensitive to local variations.
RBF Networks: Use radial basis functions in the hidden layer. Have a faster
training time due to local receptive fields of RBF units. Good for function
approximation.
Key di erences:
Number of layers: RBF networks often have a single hidden layer, while
traditional networks can have multiple.