In statistics, sample variance tells us how spread out the data points are from the average (mean) within a sample. Sample variance computes the mean of the squared differences of every data point with the mean. This proves to be useful if you have a small population (sample) from a greater number (population) since this reveals how diverse the data in the sample happens to be.
Mathematical Definition of Sample Variance
The formula for the sample variance is given by:
s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}
Where:
- n = Number of observations in the sample
- xi = Each individual observation
\bar{x} = Sample mean, calculated as:
Bias in Estimating Variance
When calculating variance for a sample, using n - 1 instead of n compensates for the bias that arises from using sample data to estimate the population variance. This correction, known as Bessel's correction, makes the sample variance an unbiased estimator.
Why Use n - 1 in Sample Variance?
When calculating sample variance, we divide by n -1 instead of n to account for Bessel's correction. This correction corrects the bias in the estimation of the population variance by using a sample. Dividing by n -1 gives an unbiased estimate of the population variance, ensuring that the sample variance is not underestimated.
Properties of Sample Variance
- Non-negative: Since it involves squaring deviations, the variance is always non-negative.
- Zero Variance: A variance of zero indicates that all data points are identical.
- Units: The variance is expressed in squared units of the original data, which is why the standard deviation is often used as a more interpretable measure.
Difference Between Sample Variance and Population Variance
Feature | Sample Variance | Population Variance |
|---|---|---|
Denominator | n - 1 | N |
Purpose | Estimate population variance from a sample | Exact variance for the entire population |
Bias | Unbiased due to Bessel’s correction | No correction needed |
Sample Variance in Python
import numpy as np
data = [4, 8, 6, 5, 10]
# Calculating sample variance
sample_variance = np.var(data, ddof=1)
print(f"Sample Variance: {sample_variance:.2f}")
Output:
Sample Variance: 5.80
Limitations of Sample Variance
- Sensitive to Outliers: A few extreme values can significantly affect the variance.
- Not Intuitive: The units are in squared terms, making interpretation difficult without taking the square root (standard deviation).