Cointegration Tests

In time series analysis, many variables show trends over time, meaning they are non-stationary. This non-stationarity can be a problem when building statistical models because it can lead to misleading results. However, sometimes two or more non-stationary time series move together in such a way that their combination becomes stationary. This relationship is called cointegration.

What is Cointegration?

Cointegration occurs when two or more non-stationary time series move together in such a way that their linear combination becomes stationary. This indicates a long-term equilibrium relationship between the variables, even if each one individually trends or drifts over time.

Reveals stable, long-run relationships between non-stationary variables. Facilitates the use of Error Correction Models (ECM), which capture:

Short-term dynamics
Long-term equilibrium behavior

Example

Take the prices of crude oil and gasoline, both of which are non-stationary and trend over time.
If the difference between them (e.g., gasoline price – oil price × some factor) is stationary, they are said to be cointegrated.

Stationarity and Its Importance

Before diving into cointegration, it’s important to understand stationarity:

A time series is stationary if its statistical properties (mean, variance, autocorrelation) do not change over time.
Most statistical models assume stationarity for accurate forecasting.
Non-stationary series can lead to spurious regression, where variables appear to be related due to trending behavior, not actual relationships.

Steps Before Conducting Cointegration Test

Step 1: Check Stationarity of Individual Series:

Use tests like Augmented Dickey-Fuller (ADF) or Phillips-Perron (PP) test to check for unit roots.
Ensure each variable is integrated of order 1.

Step 2: Visual Inspection:

Plot the series to check if they appear to move together over time.

Step 3: Perform Cointegration Test:

If the series are non-stationary but show signs of co-movement, apply cointegration tests to confirm.

Implementation of Cointegration Tests

1. Install Required Library

pip install statsmodels

2. Import Libraries

Python

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint

3. Generate or Load Time Series Data

Python

# Example: create two random walk series
np.random.seed(42)
n = 100
x = np.cumsum(np.random.normal(0, 1, n))
y = x + np.random.normal(0, 1, n)  # y is cointegrated with x + some noise

4. Perform Cointegration Test

Python

score, pvalue, _ = coint(x, y)

print(f"Cointegration Test Statistic: {score}")
print(f"P-value: {pvalue}")

if pvalue < 0.05:
    print("The series are likely cointegrated (reject null hypothesis).")
else:
    print("No evidence of cointegration (fail to reject null hypothesis).")

Output:

Cointegration Test Statistic: -10.54692388951861
P-value: 1.0672395686753919e-17
The series are likely cointegrated (reject null hypothesis).

Common Cointegration Tests

1. Engle-Granger Two-Step Method (For Two Variables)

Developed by Engle and Granger (1987)
Works well for two variables

Steps:

i). Regress one variable on the other using OLS.

y_t = \alpha + \beta x_t + \epsilon_t

ii). Test the residuals \epsilon_t from this regression for stationarity using ADF test.

If the residuals are stationary, Y and X are cointegrated.

Simple and easy to implement; assumes one cointegrating relationship, not suitable for more than two variables, and may have weak asymptotic properties.

2. Johansen Test (For Multiple Variables)

Developed by Søren Johansen (1988)
Suitable for more than two variables

Based on Vector Autoregression (VAR):

It examines the rank of the cointegration matrix.
Uses Trace test and Maximum Eigenvalue test to determine the number of cointegrating relationships.

Steps:

Make sure all variables are I(1)
Estimate a Vector Error Correction Model (VECM)
Use Trace and Eigenvalue statistics to test for number of cointegrating vectors

Trace Test Formula:

\text{Trace Statistic} = -T \sum_{i=r+1}^{n} \ln(1 - \lambda_i)

Where λ are the eigenvalues, r is the number of cointegrating relationships, and T is the number of observations.

Max Eigenvalue Test:

\text{Max Eigenvalue Statistic} = -T \ln(1 - \lambda_{r+1})

Handles multiple variables, allows testing for multiple cointegration vectors; more complex and sensitive to lag length selection.

3. Phillips-Ouliaris Test

A variation of the Engle-Granger approach
Tests the null hypothesis of no cointegration using residuals from OLS regression, but uses different critical values

Best Practices

Always start with unit root testing.
Consider visual plots and economic theory before applying tests.
Use Johansen test for more than two variables.
Ensure correct lag length and include trend or intercept if necessary.
After finding cointegration, use ECM or VECM for modeling.

Interpreting Cointegration Results

Cointegrated: A stable long-run relationship exists. Use ECM or VECM.
Not Cointegrated: No long-run equilibrium. Use differencing or other modeling techniques.
Always check residuals and diagnostics to ensure model validity.

Challenges and Limitations

Assumption of I(1): All series must be integrated of order one
Model sensitivity: Johansen test sensitive to lag length, deterministic trends
Structural breaks: Cointegration tests may fail if there are regime shifts
Overfitting: Misuse of cointegration in data mining can lead to overfitted models

What is Cointegration?

Stationarity and Its Importance

Steps Before Conducting Cointegration Test

Implementation of Cointegration Tests

1. Install Required Library

2. Import Libraries

3. Generate or Load Time Series Data

4. Perform Cointegration Test

Common Cointegration Tests

1. Engle-Granger Two-Step Method (For Two Variables)

2. Johansen Test (For Multiple Variables)

3. Phillips-Ouliaris Test

Best Practices

Interpreting Cointegration Results

Challenges and Limitations

Explore