Bayesian Updating

Last Updated : 23 Jul, 2025

Bayesian updating is a core concept in Bayesian statistics. when you're predicting whether it will rain today. You initially believe there's a 30% chance of rain based on historical patterns (prior). Then you check the weather radar and see a large storm nearby (data). Bayesian updating tells you how to revise the 30% probability using the new radar information (likelihood), constructing a new belief (posterior).

This process allows:

  • Incorporating new evidence systematically.
  • Refining beliefs incrementally over time.
  • Handling uncertainty in a probabilistic framework.

Bayes’ Theorem

Bayesian updating relies on Bayes’ theorem, which is defined as:

P(\theta \mid D) = \frac{P(D \mid \theta) \cdot P(\theta)}{P(D)}

Where:

  • \theta is a parameter or hypothesis.
  • D is observed data.
  • P(\theta) is the prior: belief about \theta before seeing the data.
  • P(D \mid \theta) is the likelihood: probability of the data given \theta.
  • P(\theta \mid D) is the posterior: updated belief after observing the data.
  • P(D) is the marginal likelihood or evidence.

Step-by-Step Process

Step 1: Define the Prior

This reflects your belief about a parameter \theta before seeing any data. For example, if \theta is the probability of a coin landing heads, you might start with a uniform prior P(\theta) = 1 for \theta \in [0, 1].

Step 2: Specify the Likelihood

This models how data is generated. If you flip the coin n times and get k heads, the likelihood is:

P(D \mid \theta) = \binom{n}{k} \theta^k (1 - \theta)^{n-k}

Step 3: Apply Bayes’ Theorem

Compute the posterior distribution:

P(\theta \mid D) \propto P(D \mid \theta) \cdot P(\theta)

The proportionality becomes equality when normalized by the marginal likelihood P(D).

Example: Coin Toss

Suppose we have a coin, and we don’t know the probability \theta of getting heads. Our goal is to infer \theta based on observed tosses.

1. Prior Belief

Before seeing any data, we assume a uniform prior distribution for \theta. The Beta distribution is a natural choice for modeling probabilities:

P(\theta) = \text{Beta}(1, 1)

This is equivalent to a uniform distribution over the interval [0, 1], meaning:

  • Every value of θ from 0 to 1 is equally likely.
  • We have no prior preference for the coin being biased one way or another.

2. Observed Data

Suppose we toss the coin 10 times and observe the following:

  • 7 heads
  • 3 tails

This is our observed dataset D, which we will use to update our prior belief.

3. Likelihood

The likelihood is the probability of observing the data D given a specific value of θ.

Assuming independent tosses, the likelihood is given by:

P(D \mid \theta) = \theta^7 (1 - \theta)^3

This expression means:

  • The chance of getting 7 heads: \theta^7
  • The chance of getting 3 tails: (1 - \theta)^3

This likelihood function favors values of θ that make 7 heads and 3 tails more probable.

4. Posterior Distribution

Using Bayes’ Theorem, the posterior distribution is proportional to the product of the prior and the likelihood:

P(\theta \mid D) \propto P(D \mid \theta) \cdot P(\theta)

Since we used a Beta prior and a Binomial likelihood, the posterior is also a Beta distribution (due to conjugacy):

P(\theta \mid D) = \text{Beta}(1 + 7, 1 + 3) = \text{Beta}(8, 4)

This posterior distribution reflects:

  • A strong belief that θ is closer to 0.67, since 7 out of 10 tosses were heads.
  • More confidence than before (less uncertainty) due to observing actual data.

5. Interpretation of the Posterior

The posterior distribution \text{Beta}(8, 4) is concentrated around values close to 0.67, indicating that θ is likely around that value.

The mean of the posterior distribution is:

\mathbb{E}[\theta] = \frac{\alpha}{\alpha + \beta} = \frac{8}{8 + 4} = \frac{2}{3}

This means:

  • Based on the data, we now estimate the probability of heads to be around 0.67
  • The shape of the Beta distribution tells us how confident we are about this estimate.

Conjugate Priors

A conjugate prior is a prior that, when combined with a particular likelihood, results in a posterior of the same family.

Examples:

Using conjugate priors simplifies analytical updates and is computationally efficient.

Python Implementation

Here’s a basic Bayesian update using a Beta prior for a Bernoulli process:

1. Import Necessary Libraries

We import libraries for numerical computation, plotting, and working with the Beta distribution.

Python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta

2. Set Prior Parameters

We define the parameters of the prior Beta distribution (uniform in this case).

Python
# Prior parameters (Beta(1, 1) = uniform distribution)
a_prior = 1
b_prior = 1

3. Update Posterior Parameters Based on Observed Data

We update the prior using 7 heads and 3 tails to get the parameters of the posterior.

Python
# Observed data: 7 heads, 3 tails
a_post = a_prior + 7
b_post = b_prior + 3

4. Compute Posterior Distribution Values

We compute the probability density of the posterior Beta(8, 4) over a range of θ values.

Python
# θ values between 0 and 1
theta = np.linspace(0, 1, 100)
posterior = beta.pdf(theta, a_post, b_post)

5. Plot the Posterior Distribution

We visualize the updated belief about the coin's bias.

Python
plt.plot(theta, posterior, label='Posterior Beta(8, 4)')
plt.title('Bayesian Update of Coin Bias')
plt.xlabel('θ')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()

Output:

bayesian-update-1
Bayesian Update of Coin Bias


Applications of Bayesian Updating

1. Machine Learning: Majorly used in-

2. Medical Diagnosis: Updating disease probabilities based on test outcomes

3. Spam Filtering: Naive Bayes classifiers use word frequencies to update spam probabilities

4. A/B Testing: Continuously update belief about which variant performs better

5. Robotics and Control: Bayesian filtering (e.g., Kalman Filter, Particle Filter)

Advantages of Bayesian Updating

  • Intuitive: Models learning as belief revision.
  • Dynamic: Adapts to incoming data naturally.
  • Uncertainty Quantification: Posterior distributions provide full uncertainty information.
  • Principled Approach: Grounded in probability theory.

Limitations and Challenges

  • Computational Complexity: Analytical solutions are rare for complex models.
  • Prior Sensitivity: Choice of prior can affect results, especially with small data.
  • Integration Difficulties: Calculating the marginal likelihood 𝑃(𝐷) may require approximation.

To address these, techniques like Markov Chain Monte Carlo (MCMC) or variational inference are used.

Similar Articles

Comment