Vector Autoregression (VAR) for Multivariate Time Series

Last Updated : 14 May, 2024

Vector Autoregression (VAR) is a statistical tool used to investigate the dynamic relationships between multiple time series variables. Unlike univariate autoregressive models, which only forecast a single variable based on its previous values, VAR models investigate the interconnectivity of many variables. They accomplish this by modeling each variable as a function of not only its previous values but also of the past values of other variables in the system. In this article, we are going to explore the fundamentals of Vector Autoregression.

What is Vector Autoregression?

Vector Autoregression was first presented in the 1960s by economist Clive Granger. Granger's significant discoveries laid the framework for understanding and modeling the dynamic interactions that exist among economic factors. VAR models acquired significant momentum in econometrics and macroeconomics during the 1970s and 1980s.

Vector Autoregression (VAR) is a multivariate extension of autoregression (AR) models. While traditional AR models analyze the relationship between a single variable and its lagged values, VAR models consider multiple variables simultaneously. In a VAR model, each variable is regressed on its own lagged values as well as lagged values of other variables in the system.

Mathematical Intuition of VAR Equations

VAR models are mathematically represented as a system of simultaneous equations, where each equation describes the behavior of one variable as a function of its own lagged values and the lagged values of all other variables in the system.

Mathematically, a VAR(p) model with 'p' lags can be represented as:

Y_t = c + \Phi_1 Y_{t-1} + \Phi_2 Y_{t-2} + \dots + \Phi_p Y_{t-p} + \varepsilon_t

Here,

  • Y_t: This represents the value of the time series at time t.
  • c: This represents the constant intercept term in the model.
  • \Phi_1, \Phi_2, ..., \Phi_p: These represent the autoregressive coefficients for lags 1, 2, ..., p, respectively.
  • Y_{t-1}, Y_{t-2}, ..., Y_{t-p}: These represent the values of the time series at lags 1, 2, ..., p before time t.
  • \varepsilon_t: This represents the error term at time t.

To ensure the validity and trustworthiness of the results from VAR analysis, various assumptions and requirements must be met.

Assumptions underlying the VAR model

VAR analysis is subject to several assumptions and requirements to ensure the validity and reliability of the results:

  1. Linearity: Relationships between variables are linear.
  2. Stationarity: Time series data are stationary.
  3. No Perfect Multicollinearity: No perfect linear relationships exist between variables.
  4. No Autocorrelation in Residuals: Residuals are not serially correlated.
  5. Homoscedasticity: Residual variance is constant.
  6. No Endogeneity: Variables are not affected by omitted factors.
  7. Exogeneity: Explanatory variables are not influenced by other variables.
  8. Sufficient Observations: Adequate data for parameter estimation.
  9. Weak Exogeneity: Some variables may be endogenous but not contemporaneously correlated with errors.

Steps to Implement VAR on Time Series Model

The code conducts Vector Autoregression (VAR) analysis on randomly generated time series data, including stationarity testing, VAR modeling, forecasting, and visualization of the forecasted outcomes.

Step 1: Importing necessary libraries

Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import adfuller


Step 2: Generate Sample Data

Python
# Sample data generation
np.random.seed(0)
dates = pd.date_range(start='2024-01-01', periods=100)
data = pd.DataFrame(np.random.randn(100, 3), index=dates, columns=['A', 'B', 'C'])


Step 3: Function to plot time series

Python
# Function to plot time series
def plot_series(data):
    fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 8))
    for i, col in enumerate(data.columns):
        data[col].plot(ax=axes[i], title=col)
        axes[i].set_ylabel('Values')
        axes[i].set_xlabel('Date')
    plt.tight_layout()
    plt.show()
    
plot_series(data)

Output:


download-(10)-min
Generated Sample Data


Step 4: Function to check stationarity

Checking for stationarity in time series data is crucial for VAR (Vector Autoregression) modeling because VAR assumes that the time series variables are stationary. Stationarity implies that the statistical properties of the time series remain constant over time, such as mean, variance, and autocorrelation.

Python
# Check stationarity of time series using ADF test
def check_stationarity(timeseries):
    result = adfuller(timeseries)
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    print('Critical Values:')
    for key, value in result[4].items():
        print('\t%s: %.3f' % (key, value))


Step 5: VAR analysis

This part defines a function var_analysis(data) that conducts Vector Autoregression (VAR) analysis on the given dataset. It consists of four steps: checking stationarity and visualizing the original data, applying the VAR model, forecasting future values, and visualizing the forecast. Finally, it calls the var_analysis() function with the provided data to execute the analysis.

In the third step, the code forecasts future values using the VAR model. It first determines the lag order of the model (lag_order) and then uses this information to generate forecasts for the next 10 steps (steps=10) and in fourth step, the forecasted values are visualized. A new set of date indices (forecast_index) starting from '2024-04-11' for the next 10 periods is created.

Python
# Section for VAR analysis
def var_analysis(data):
    # Step 1: Check stationarity and visualize the original data
    print("Step 1: Checking stationarity")
    for col in data.columns:
        print('Stationarity test for', col)
        check_stationarity(data[col])

    # Step 2: Applying VAR model
    print("\nStep 2: Applying VAR model")
    model = VAR(data)
    results = model.fit()

    # Step 3: Forecasting
    print("\nStep 3: Forecasting")
    lag_order = results.k_ar
    forecast = results.forecast(data.values[-lag_order:], steps=10)

    # Step 4: Visualizing forecast
    print("\nStep 4: Visualizing forecast")
    forecast_index = pd.date_range(start='2024-04-11', periods=10)
    forecast_data = pd.DataFrame(forecast, index=forecast_index, columns=data.columns)
    plot_series(pd.concat([data, forecast_data]))

# Perform VAR analysis
var_analysis(data)

Output:

Step 1: Checking stationarity and visualizing the original data
Stationarity test for A
ADF Statistic: -8.43759993424834
p-value: 1.7990274249398063e-13
Critical Values:
    1%: -3.498
    5%: -2.891
    10%: -2.583
Stationarity test for B
ADF Statistic: -11.229664527662438
p-value: 1.9214648218450937e-20
Critical Values:
    1%: -3.498
    5%: -2.891
    10%: -2.583
Stationarity test for C
ADF Statistic: -9.028783852793346
p-value: 5.516998045646418e-15
Critical Values:
    1%: -3.498
    5%: -2.891
    10%: -2.583

Step 2: Applying VAR model
Step 3: Forecasting
Step 4: Visualizing forecast


download-(11)-min
Forecasting for period of next 10 steps


Output Explanation

The results of the Augmented Dickey-Fuller (ADF) test for each variable in the dataset.

  • Stationarity test for A: The ADF statistic is -8.438, and the p-value is approximately 1.799e-13. Since the p-value is much smaller than 0.05 (a common significance level), we reject the null hypothesis of non-stationarity. The critical values at 1%, 5%, and 10% significance levels are also provided for reference.
  • Stationarity test for B: The ADF statistic is -11.230, and the p-value is approximately 1.921e-20. Again, since the p-value is much smaller than 0.05, we reject the null hypothesis of non-stationarity. The critical values at different significance levels are also provided.
  • Stationarity test for C: The ADF statistic is -9.029, and the p-value is approximately 5.517e-15. Similar to variables A and B, the small p-value indicates that we reject the null hypothesis of non-stationarity for variable C. Critical values at different significance levels are also provided.

All three variables (A, B, and C) in the dataset are stationary based on the results of the Augmented Dickey-Fuller test.

Applications of VAR Models

  1. Economic Forecasting: VAR models are widely used in economics to forecast the behavior of economic variables such as GDP, inflation, and interest rates.
  2. Causal Inference: By studying the impulse responses generated by VAR models, researchers can infer the causal impact of one variable on another. This is particularly valuable in policy evaluation.
  3. Financial Markets: VAR models can be used to predict financial indices, stocks and asset prices.
Comment