Sample Variance

Last Updated : 23 Jul, 2025

In statistics, sample variance tells us how spread out the data points are from the average (mean) within a sample. Sample variance computes the mean of the squared differences of every data point with the mean. This proves to be useful if you have a small population (sample) from a greater number (population) since this reveals how diverse the data in the sample happens to be.

Mathematical Definition of Sample Variance

The formula for the sample variance is given by:

s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}

Where:

  • n = Number of observations in the sample
  • xi = Each individual observation
  • \bar{x} = Sample mean, calculated as:

Bias in Estimating Variance

When calculating variance for a sample, using n - 1 instead of n compensates for the bias that arises from using sample data to estimate the population variance. This correction, known as Bessel's correction, makes the sample variance an unbiased estimator.

Why Use n - 1 in Sample Variance?

When calculating sample variance, we divide by n -1 instead of n to account for Bessel's correction. This correction corrects the bias in the estimation of the population variance by using a sample. Dividing by n -1 gives an unbiased estimate of the population variance, ensuring that the sample variance is not underestimated.

Properties of Sample Variance

  • Non-negative: Since it involves squaring deviations, the variance is always non-negative.
  • Zero Variance: A variance of zero indicates that all data points are identical.
  • Units: The variance is expressed in squared units of the original data, which is why the standard deviation is often used as a more interpretable measure.

Difference Between Sample Variance and Population Variance

Feature

Sample Variance

Population Variance

Denominator

n - 1

N

Purpose

Estimate population variance from a sample

Exact variance for the entire population

Bias

Unbiased due to Bessel’s correction

No correction needed

Sample Variance in Python

Python
import numpy as np
data = [4, 8, 6, 5, 10]

# Calculating sample variance
sample_variance = np.var(data, ddof=1)
print(f"Sample Variance: {sample_variance:.2f}")

Output:

Sample Variance: 5.80

Limitations of Sample Variance

  • Sensitive to Outliers: A few extreme values can significantly affect the variance.
  • Not Intuitive: The units are in squared terms, making interpretation difficult without taking the square root (standard deviation).
Comment

Explore