Bayesian updating is a core concept in Bayesian statistics. when you're predicting whether it will rain today. You initially believe there's a 30% chance of rain based on historical patterns (prior). Then you check the weather radar and see a large storm nearby (data). Bayesian updating tells you how to revise the 30% probability using the new radar information (likelihood), constructing a new belief (posterior).
This process allows:
- Incorporating new evidence systematically.
- Refining beliefs incrementally over time.
- Handling uncertainty in a probabilistic framework.
Bayes’ Theorem
Bayesian updating relies on Bayes’ theorem, which is defined as:
P(\theta \mid D) = \frac{P(D \mid \theta) \cdot P(\theta)}{P(D)}
Where:
\theta is a parameter or hypothesis.D is observed data.P(\theta) is the prior: belief about\theta before seeing the data.P(D \mid \theta) is the likelihood: probability of the data given\theta .P(\theta \mid D) is the posterior: updated belief after observing the data.P(D) is the marginal likelihood or evidence.
Step-by-Step Process
Step 1: Define the Prior
This reflects your belief about a parameter
Step 2: Specify the Likelihood
This models how data is generated. If you flip the coin
P(D \mid \theta) = \binom{n}{k} \theta^k (1 - \theta)^{n-k}
Step 3: Apply Bayes’ Theorem
Compute the posterior distribution:
P(\theta \mid D) \propto P(D \mid \theta) \cdot P(\theta)
The proportionality becomes equality when normalized by the marginal likelihood
Example: Coin Toss
Suppose we have a coin, and we don’t know the probability
1. Prior Belief
Before seeing any data, we assume a uniform prior distribution for
P(\theta) = \text{Beta}(1, 1)
This is equivalent to a uniform distribution over the interval
- Every value of θ from 0 to 1 is equally likely.
- We have no prior preference for the coin being biased one way or another.
2. Observed Data
Suppose we toss the coin 10 times and observe the following:
- 7 heads
- 3 tails
This is our observed dataset
3. Likelihood
The likelihood is the probability of observing the data
Assuming independent tosses, the likelihood is given by:
P(D \mid \theta) = \theta^7 (1 - \theta)^3
This expression means:
- The chance of getting 7 heads:
\theta^7 - The chance of getting 3 tails:
(1 - \theta)^3
This likelihood function favors values of θ that make 7 heads and 3 tails more probable.
4. Posterior Distribution
Using Bayes’ Theorem, the posterior distribution is proportional to the product of the prior and the likelihood:
P(\theta \mid D) \propto P(D \mid \theta) \cdot P(\theta)
Since we used a Beta prior and a Binomial likelihood, the posterior is also a Beta distribution (due to conjugacy):
P(\theta \mid D) = \text{Beta}(1 + 7, 1 + 3) = \text{Beta}(8, 4)
This posterior distribution reflects:
- A strong belief that θ is closer to 0.67, since 7 out of 10 tosses were heads.
- More confidence than before (less uncertainty) due to observing actual data.
5. Interpretation of the Posterior
The posterior distribution
The mean of the posterior distribution is:
\mathbb{E}[\theta] = \frac{\alpha}{\alpha + \beta} = \frac{8}{8 + 4} = \frac{2}{3}
This means:
- Based on the data, we now estimate the probability of heads to be around 0.67
- The shape of the Beta distribution tells us how confident we are about this estimate.
Conjugate Priors
A conjugate prior is a prior that, when combined with a particular likelihood, results in a posterior of the same family.
Examples:
- Beta prior + Bernoulli/binomial likelihood ⇒ Beta posterior
- Gaussian prior + Gaussian likelihood ⇒ Gaussian posterior
- Gamma prior + Poisson likelihood ⇒ Gamma posterior
Using conjugate priors simplifies analytical updates and is computationally efficient.
Python Implementation
Here’s a basic Bayesian update using a Beta prior for a Bernoulli process:
1. Import Necessary Libraries
We import libraries for numerical computation, plotting, and working with the Beta distribution.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta
2. Set Prior Parameters
We define the parameters of the prior Beta distribution (uniform in this case).
# Prior parameters (Beta(1, 1) = uniform distribution)
a_prior = 1
b_prior = 1
3. Update Posterior Parameters Based on Observed Data
We update the prior using 7 heads and 3 tails to get the parameters of the posterior.
# Observed data: 7 heads, 3 tails
a_post = a_prior + 7
b_post = b_prior + 3
4. Compute Posterior Distribution Values
We compute the probability density of the posterior Beta(8, 4) over a range of θ values.
# θ values between 0 and 1
theta = np.linspace(0, 1, 100)
posterior = beta.pdf(theta, a_post, b_post)
5. Plot the Posterior Distribution
We visualize the updated belief about the coin's bias.
plt.plot(theta, posterior, label='Posterior Beta(8, 4)')
plt.title('Bayesian Update of Coin Bias')
plt.xlabel('θ')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()
Output:

Applications of Bayesian Updating
1. Machine Learning: Majorly used in-
2. Medical Diagnosis: Updating disease probabilities based on test outcomes
3. Spam Filtering: Naive Bayes classifiers use word frequencies to update spam probabilities
4. A/B Testing: Continuously update belief about which variant performs better
5. Robotics and Control: Bayesian filtering (e.g., Kalman Filter, Particle Filter)
Advantages of Bayesian Updating
- Intuitive: Models learning as belief revision.
- Dynamic: Adapts to incoming data naturally.
- Uncertainty Quantification: Posterior distributions provide full uncertainty information.
- Principled Approach: Grounded in probability theory.
Limitations and Challenges
- Computational Complexity: Analytical solutions are rare for complex models.
- Prior Sensitivity: Choice of prior can affect results, especially with small data.
- Integration Difficulties: Calculating the marginal likelihood 𝑃(𝐷) may require approximation.
To address these, techniques like Markov Chain Monte Carlo (MCMC) or variational inference are used.