Lilliefors Test

The Lilliefors test is used to check whether a set of data comes from a normal distribution specially when we don’t know the population mean and variance beforehand.The Lilliefors test adjusts for this by using special critical values that account for the fact that the mean and variance are estimated making the test more accurate in these situations. It’s often used to verify if data is suitable for statistical methods that assume normality, helping analysts make better decisions.

Key Features

Tests Normality with Unknown Parameters: Unlike the standard Kolmogorov Smirnov test, the Lilliefors test allows for testing normality when the mean and variance are not known and must be estimated from the data.
Uses Adjusted Critical Values: Because parameters are estimated from the data the test uses special critical values or p values to accurately determine whether to reject normality.
Non Parametric and Distribution Free: The test does not assume any specific parametric form for the data beyond normality, making it flexible for various sample sizes.
Simple to Implement: The Lilliefors test is straightforward and available in many statistical software packages making it easy to use in practice.

How does it Work?

1. Estimate Parameters from the Sample

First the test calculates the sample mean and sample standard deviation from the data.
These estimates serve as the parameters of the normal distribution being tested.

2. Calculate the Empirical Distribution Function (EDF)

The EDF is a step function that represents the proportion of sample points less than or equal to any given value.
It is constructed directly from the data without assuming any underlying distribution.

3. Compute the Theoretical Cumulative Distribution Function (CDF)

Using the estimated mean and standard deviation the test calculates the CDF of a normal distribution for each data point.
The CDF gives the probability that a random variable from this distribution is less than or equal to that point.

4. Measure the Maximum Difference (D statistic)

The core of the Lilliefors test is finding the largest absolute difference between the EDF and the theoretical CDF across all data points.
This maximum difference is called the D statistic.

5. Compare with Critical Values

Since the mean and variance are estimated from the sample, the distribution of the D statistic under the null hypothesis differs from the standard Kolmogorov Smirnov test.
The Lilliefors test uses special critical values or p values usually obtained through simulations or tables to decide whether the observed D is large enough to reject normality.

6. Make a Decision

If the calculated D exceeds the critical value at a chosen significance level the test rejects the null hypothesis suggesting that the data does not follow a normal distribution.
Otherwise there are not enough evidence to reject normality.

Example

This code performs the Lilliefors test to check if the given data follows a normal distribution.
It calculates the test statistic and p value then compares the p value to a 0.05 significance level to decide whether to accept or reject the normality assumption.

Python

import numpy as np
from statsmodels.stats.diagnostic import lilliefors

# Sample data
data = np.array([12.1, 14.5, 13.3, 15.6, 14.2, 13.8, 12.9, 14.0])
statistic, p_value = lilliefors(data)

print(f"Lilliefors test statistic: {statistic:.4f}")
print(f"P value: {p_value:.4f}")

alpha = 0.05
if p_value > alpha:
    print("Fail to reject null hypothesis - data is normally distributed")
else:
    print("Reject null hypothesis - data is not normally distributed")

if (!require(nortest)) {
  install.packages("nortest")
  library(nortest)
}

# Sample data
data <- c(12.1, 14.5, 13.3, 15.6, 14.2, 13.8, 12.9, 14.0)
test_result <- lillie.test(data)
cat(sprintf("Lilliefors test statistic: %.4f\n", test_result$statistic))
cat(sprintf("P value: %.4f\n", test_result$p.value))
alpha <- 0.05
if (test_result$p.value > alpha) {
  cat("Fail to reject null hypothesis - data is normally distributed\n")
} else {
  cat("Reject null hypothesis - data is not normally distributed\n")
}

Output:

Lilliefors test statistic: 0.1297
P value: 0.9579
Fail to reject null hypothesis - data is normally distributed

Applications

Normality Testing: The Lilliefors test is primarily used to check if a dataset follows a normal distribution when the mean and variance are unknown. This is important before applying many statistical methods that assume normality.
Preprocessing in Statistical Analysis: Before running parametric tests like t tests or ANOVA the Lilliefors test helps verify that the data meets the normality assumption. This step ensures the validity of these tests.
Quality Control: In manufacturing and process monitoring, the test is applied to confirm if measurement data is normally distributed which is often a key assumption in quality control techniques.
Model Assumption Validation: For regression and other statistical models the test checks if residuals or errors follow a normal distribution which is important for model accuracy and interpretation.

Advantages

Accounts for Estimated Parameters: Unlike the Kolmogorov Smirnov test, the Lilliefors test adjusts for the fact that the mean and variance are estimated from the data leading to more reliable results in practical scenarios.
Simple to Use: The test is straightforward to implement and interpret making it accessible for many users without requiring deep statistical expertise.
Non Parametric Nature: It does not assume specific parameters besides normality so it can be applied flexibly across different sample sizes and contexts.
Widely Supported: Many statistical software packages include the Lilliefors test making it easy to apply in research and industry settings without custom coding.

Disadvantages

Lower Statistical Power: Compared to other tests like Shapiro Wilk, the Lilliefors test may be less sensitive at detecting deviations from normality specially when dealing with small sample sizes.
Only Tests for Normality: It cannot assess whether data follows other types of distributions limiting its use when non normality is expected.
Requires Special Critical Values: The test relies on adjusted critical values or simulation based p values that aren’t as readily available as those for some other tests complicating interpretation.
Can be Conservative: In some cases the test may not detect subtle departures from normality leading to false acceptance of the null hypothesis and potentially misleading conclusions.