What Is Time Series Analysis
What Is Time Series Analysis
Time series analysis comprises methods for analyzing time series data in order to extract
meaningful statistics and other characteristics of the data.
Regression analysis is “finding the best-fitting straight line for a set of data”
(Gravetter & Wallnau, 2011).
While a linear regression analysis is good for simple relationships like height and age or time
studying and GPA, if we want to look at relationships over time in order to identify trends,
cyclicity, seasonality or random component in data, we use a time series regression analysis.
Traditional methods of time series analysis are concerned with the decomposition of the
time series into four main components; Trend, Cyclical, Seasonal and Irregular components.
Trend is the tendency of the time series to increase, decrease or remain stable over a period
of time. In other words, trend is the long-term movement of the time series. For example,
we may observe increasing values and upward trend in the demand of ice-creams during
warmer months and lower demand and thus, downward term in the colder periods.
Seasonality is variations over a fixed and known period. This interval could be annual,
monthly, weekly, daily etc. For example, sales of coats are increased during winter months
and decreased elsewhere.
Some factors causing seasonality might be temperature and climate conditions, income and
overall shape of the economic environment, etc. Seasonal variations are important
component for time series modeling and should be taken into account when building such
models.
A cyclic pattern exists when data exhibit rises and falls that are not of fixed period. The
duration of these fluctuations is usually of at least 2 years.
Irregular component is related to highly random fluctuations with no certain patterns. These
variations are caused by random incidents such as wars, economic recession, floods, etc.
Although seasonality and cyclic behavior might seem similar, they are actually quite
different. If the fluctuations are not of fixed length then they exhibit cyclicality. If the period
remains unchanged and associated with some aspect of the calendar, then the pattern is
seasonal.
The top left graph (monthly housing sales) exhibits strong annual seasonality and 6-year
cyclicality.
The top right graph (results from the Chicago market for 100 consecutive trading days)
exhibits no seasonality but an obvious downward trend.
The bottom left graph (monthly electricity production) shows both seasonality and upward
trend.
The bottom right graph (daily change in the Dow Jones index) is white noise and has no
trend and seasonality and exhibits high irregularity.
Concept of Stationarity
A stationary time series is one whose properties do not depend on the time at which the
series is observed.
Thus, time series with trends, or with seasonality, are not stationary — the trend and
seasonality will affect the value of the time series at different times.
A stationary process has the property that the mean, variance and autocorrelation structure
do not change over time.
The red line depicts the graphical presentation of non-stationary series with a linear trend and the
blue line depicts stationary series with the linear trend removed.
Transformations such as logarithms can help to stabilize the variance of a time series.
Differencing can help stabilize the mean of a time series by removing changes in the level of
a time series, and therefore eliminating (or reducing) trend and seasonality.
The differenced series will have only T−1 values, since it is not possible to calculate a
difference y1′ for the first observation.
When the differenced series is white noise, the model for the original series can be written
as :
1
Random walk models are widely used for non-stationary data, particularly financial and
economic data.
Random walks typically have:
long periods of apparent trends up or down
Sudden and unpredictable changes in direction.
The forecasts from a random walk model are equal to the last observation, as future
movements are unpredictable, and are equally likely to be up or down. Thus, the random
walk model underpins naïve forecasts.
A closely related model allows the differences to have a non-zero mean. Then yt−y(t−1)=c+εt
or yt=c+y(t−1)+εt.
The value of c is the average of the changes between consecutive observations.
If c is positive, then the average change is an increase in the value of yt.
Thus, yt will tend to drift upwards.
However, if c is negative, yt will tend to drift downwards.
In this case, yt’’ will have T-2 values. Then, we would model the “change in the changes” of
the original data.
SEASONAL DIFFERENCING
A seasonal difference is the difference between an observation and the previous observation
from the same season. So,
Where m= the number of seasons. These are also called “lag- m differences”, as we subtract
the observation after a lag of m periods.
If seasonally differenced data appear to be white noise, then an appropriate model for the
original data is
Forecasts from this model are equal to the last observation from the relevant season.
A number of unit root tests are available, which are based on different assumptions and may
lead to conflicting answers.
One of them is the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (Kwiatkowski, Phillips,
Schmidt, & Shin, 1992).
In this test, the null hypothesis is that the data are stationary, and we look for evidence that
the null hypothesis is false. Consequently, small p-values (e.g., less than 0.05) suggest that
differencing is required.
Autoregressive Models
Where εt is white noise. This is like a multiple regression but with lagged values of YT as predictors.
We refer to this as an AR (p) model, an autoregressive model of order p.
Autoregressive models are remarkably flexible at handling a wide range of different time series
patterns. The two series in Figure below shows series from an AR (1) model and an AR (2) model.
Changing the parameters ϕ1… ϕp results in different time series patterns. The variance of the error
term εt will only change the scale of the series, not the patterns.
Two examples of data from autoregressive models with
different parameters. Left: AR (1) with yt=18−0.8y (t−1) +εt.
Right: AR (2) with yt=8+1.3y (t−1) −0.7y (t−2) +εt. In both cases,
εt is normally distributed white noise with mean zero and
variance one.
We normally restrict autoregressive models to stationary data, in which case some constraints on
the values of the parameters are required.
Rather than using past values of the forecast variable in a regression, a moving average model uses
past forecast errors in a regression-like model.yt=c+εt+θ1ε (t−1) +θ2ε (t−2) +⋯+θqε (t−q),
Where εt is white noise. We refer to this as an MA (q) model, a moving average model of order q
.We do not observe the values of εt, so it is not really a regression in the usual sense.
Notice that each value of yt can be thought of as a weighted moving average of the past few forecast
errors.
Two examples of data from moving average models with different parameters.
Left: MA (1) with YT=20+εt+0.8ε (t−1). Right: MA (2) with yt=εt−ε (t−1) +0.8ε (t−2)
In both cases, εt is normally distributed white noise with mean zero and variance
one.
It is possible to write any stationary AR (p) model as an MA (∞) model. For example, using
repeated substitution, we can demonstrate this for an AR (1) model:
So, we obtain:
An MA (∞) process.
The reverse result holds if we impose some constraints on the MA parameters. Then the
MA model is called invertible.
That is, we can write any invertible MA (q) process as an AR (∞) process.
Invertible models are not simply introduced to enable us to convert from MA models to
AR models. They also have some desirable mathematical properties.
For example, consider the MA (1) process, yt=εt+θ1ε (t−1).
In its AR(∞) representation, the most recent error can be written as a linear function of
current and past observations:
When |θ|>1, the weights increase as lags increase, so the more distant the observations
the greater their influence on the current error.
When |θ|=1, the weights are constant in size, and the distant observations have the same
influence as the recent observations.
As neither of these situations make much sense, we require |θ|<1, so the most recent
observations have higher weight than observations from the more distant past.
Thus, the process is invertible when |θ|<1.
The invertibility constraints for other models are similar to the stationarity constraints.
ARIMA MODELS
The fact of introducing ARIMA models comes from the assumption that we are not
working with a non-stationary dataset series.
We say that time series datasets are stationary when their means, variance and auto
covariance don’t change during time.
The majority of economic time series are not stationary, but differencing them
determined number times makes them stationary.
In general we say that a temporal set Yt admits an integrated autoregressive
representation with p, q and d moving average orders respectively.
We denote this forecasting model by ARIMA (p, d, q).
In ARIMA, p denotes the number of autoregressive terms, d denotes the number of times that
the set should be differentiated for making it stationary. The last parameter q denotes the
number of invertible moving average terms.
-Identification: With the time dataset we try to incorporate a relevant research model. The
objective is to find the best values reproducing the time set variable to forecast.
-Analysis and Differentiation: This step consists on studying the time set. In this study we
incorporate different statistical tools like ACF and PACF tests, selecting the model parameters.
-Adjusting ARIMA model: We extract the determination coefficients and adjust the model.
-Prediction: Once we have selected the best model, we can make a forecasting based on
probabilistic future values.
Although an ARCH model could possibly be used to describe a gradually increasing variance over
time, most often it is used in situations in which there may be short periods of increased
variation.
(Gradually increasing variance connected to a gradually increasing mean level might be better
handled by transforming the variable.)
ARCH models were created in the context of econometric and finance problems having to
do with the amount that investments or stocks increase (or decrease) per time period, so
there’s a tendency to describe them as models for that type of variable.
For that reason, the variable of interest in these problems might either be, the proportion
gained or lost since the last time, or, the logarithm of the ratio of this time’s value to last
time’s value. It’s not necessary that one of these be the primary variable of interest.
An ARCH model could be used for any series that has periods of increased or decreased
variance. This might, for example, be a property of residuals after an ARIMA model has
been fit to the data.
GENERALIZATION
An ARCH (m) process is one for which the variance at time is conditional on observations at the
previous m times, and the relationship is:
With certain constraints imposed on the coefficients, the yt series squared will theoretically be
AR (m).
A GARCH (generalized autoregressive conditionally heteroscedastic) model uses values of the
past squared observations and past variances to model the variance at time. As an example, a
GARCH (1, 1) is:
In the GARCH notation, the first subscript refers to the order of the y2 terms on the right side,
Box - Jenkins Analysis refers to a systematic method of identifying, fitting, checking, and
using integrated autoregressive, moving average (ARIMA) time series models.
The method is appropriate for time series of medium to long length (at least 50
observations).
A plot of the autocorrelation of a time series by lag is called the Autocorrelation Function,
or the acronym ACF. This plot is sometimes called a correlogram or an autocorrelation
plot.
A partial autocorrelation is a summary of the relationship between observations in a time
series with observations at prior time steps with the relationships of intervening
observations removed.
The autocorrelation for an observation and an observation at a prior time step is
comprised of both the direct correlation and indirect correlations. These indirect
correlations are a linear function of the correlation of the observation, with observations
at intervening time steps.
It is these indirect correlations that the partial autocorrelation function seeks to remove.
Without going into the math, this is the intuition for the partial autocorrelation.
PRACTICAL APPLICATION
INTRODUCTION
Herein, we seek to conduct technical analysis and forecasting for 30 days for stock prices
of Amazon using Time Series Analysis.
The analysis involves Amazon Stock price data from Jan-02-2015 to Aug-27-2019
A more advanced view can be considered by adding Bollinger Band chart, % Bollinger change,
Volume Traded and Moving Average Convergence Divergence in 2018 alone:
The Bollinger Band chart plots two standard deviations away from the moving average
and is used to measure the stock’s volatility.
The Volume chart shows how its stocks are traded on a daily basis.
The Moving Average Convergence Divergence gives technical analysts buy/sell signals.
The rule of thumb is: If it falls below the line, it is time to sell. If it rises above the line, it is
experiencing an upward momentum.
ARIMA Model
First, we conduct an Augmented Dickey Fuller Test to check for the stationarity in
dataset.
Results:
Interpretation:
It is clearly indicative of the fact that the null hypothesis can’t be rejected at 5% level of
significance and hence, the data is non-stationary.
After this, we need to apply ACF and PACF plots on the data set:
Autocorrelation refers to how correlated a time series is with its past values. As
we know in AR models, the ACF will dampen exponentially.
The ACF is the plot used to see the correlation between the points, up to and
including the lag unit. We can see that the autocorrelations are significant for a
large number of lags, but perhaps the autocorrelations at posterior lags are
merely due to the propagation of the autocorrelation at the first lags.
For identifying the (p) order of the AR model we use the PACF plot.
For MA models we will use ACF plot to identify the (q) order and the PACF will
dampen exponentially.
If we look at the PACF plot, we can note that the it has a significant spike only at
first lags, meaning that all the higher-order autocorrelations are effectively
explained by the first lag autocorrelation.
As we are using AUTO-ARIMA function that gives us the better approach to the
dataset, we need not conduct deep analysis on finding model parameters.
Results:
In order to interpret these results, we need to conduct the analysis of residuals based on
the above results for ARIMA parameters so selected.
The “residuals” in a time series model are what is left over after fitting a model.
In majority of time series models, the residuals are equal to the difference between the
observations and the fitted values:
Next, we check our residuals over a normal curve:
As we can see, the residuals plot has a descent normal curve adjustment, giving us a good
point to continue this study.
Now we can make our last residuals plot using the tsdiag function, giving us the
standardized residuals, ACF of residuals and p-values for Ljung-Box statistic plots.
Having our model applied and analysed we can plot the model prediction in a red line
over the real train set stock close price.
ARIMA RESULTS:
As we can see, the AUTO-ARIMA selects the best model parameters, giving us a very good
estimation.
.Now with the model fitted we can proceed to forecast our daily close price values to the
future.
We focus on forecasting the close stock price for the next 30 days or an average month.
Plotting the predictions :
The blue line is indicative of the mean of predictions.
Further Analysis:
With the blue line explained we can see a darker and light darker areas, representing 80%
and 95% confidence intervals respectively in lower and upper scenarios.
Lower Scenario:
Upper Scenario:
Finalizing our AUTO-ARIMA model we do a quick test and train set approach dividing the
close price data.
We select our train set as the 70 per cent of our dataset.
The test set y includes remaining 30 per cent of the dataset.
Once we have our prediction applied over the train set we plot the mean tendency of our
forecasting over the test set close price move.
In the red line we see our mean forecasting prediction tendency over the real close price
of the stock.
The tendency shows a good approach in predicting the future direction of the close price.
However, it can be further improved by accounting for the volatility of the data
GARCH MODEL
The Generalized Autoregressive Conditional Heteroskedasticity model has a foundation
on making "Volatility clustering.
This clustering of volatility is based on there are periods with relative calm movements
and periods of high volatility.
This behaviour is very typical in the financial stock market data as we said and GARCH
model is a very good approach to minimize the volatility effect.
From evaluating GARCH models implementation we will take the normal residuals and
then square them.
By doing this for residuals plots, any volatile values will visually appear.
We try to apply a standard GARCH (1, 1) model over ARMA (2, 2), looking if we have
improved our accuracy and model parameters.
As we can see, the last years of our data have higher peaks, explained by the economic
instability in markets in the last years.
Next, we check our Akaike and other information of the model.
Results:
With this information now we proceed to plot the residuals. We first plot the normal
residuals:
Results:
With Ljung Box test we can see that our standardized squared residuals do not reject the
null hypothesis, confirming that we are not having autocorrelation between them.
As we explained before with our model volatility, we can see that our model residuals are
bigger in the last years’ data.
This can be caused by higher data volatility in 2018 and 2019. As we found our volatility
and residuals behaviour we can proceed forecasting our next 30 days price data and
compare it to the other models.
GARCH FORECAST
PROPHET
The origin of prophet comes from the application of a forecasting model into supply chain
management, sales and economics.
This model helps with a statistical approach in shaping business decisions. The Prophet
model has been developed by Facebook’s Core Data Science team and it is an open-
source tool for business forecasting.
The Prophet model is an additive model with the components g (t) models trends, s (t)
models seasonality with Fourier series, h (t) effects of holidays or large events.
The correct approach for doing this in Prophet is to create a cross-validation process and
analyse the model performance metrics, but we are trying to compare the ARIMA vs the
other models with the same approach.
Finally for a better understanding of the dataset we can plot our prophet components
divided by a trend component, weekly seasonality and yearly seasonality.
KNN Regression time series forecasting
KNN model can be used for both classification and regression problems. The most popular
application is to use it for classification problems.
With the tsfknn package in R, KNN can be implemented on any regression task.
For predicting values of new data points, the model uses ‘feature similarity’, assigning a
new point to values based on how close it resembles the points on the training set.
For this prediction we will use a k equals to 40 as an experimental value because we did
an heuristic approach trying to find the best k value
Prediction Results :
This model’s approach is to use lagged values of the time series as input data, reaching to
a non-linear autoregressive model.
For this approach we will select the specific number of hidden nodes as:
Forecasts:
Next, we plot the model test prediction in a red line over the real train set stock close
price.
Comparative Analysis of Models:
Having our ARIMA model we compute our GARCH results over ARIMA AIC and BIC.
We start showing the ARIMA AIC and BIC:
GARCH results :
As we can see we have reached significant better Akaike and Bayes tests values for our
model.
Prophet values :
KNN Values :
We can conclude that the ARIMA and Neural Net models performed very well inside the
prediction intervals and the accuracy metrics.
The other models as they are new in this forecasting approach and the objective is to
apply them in an intuitive form did not performed as well as ARIMA or Neural Net
models.
Maybe Prophet and KNN need more tuning for getting more accurate results.
Conclusion:
In this study we focused in the application of different models, learning how to use them
with the objective to forecast new price values.
As we can see from our results, the models performed with similar future tendency
predictions.
All the models predicted a tendency of a higher price in 30 next days.
Auto Regressive models, as they use the past data to predict future values, tend to have
an asymptotic prediction in long period future forecasts.
Finally we conclude that ARIMA and Neural Nets are the best predicting models in this
scenario, while incorporating GARCH to our ARIMA approach further improved the
results.
The other models used did not perform as well as ARIMA and Neural Nets under our
metrics but this could be because they may need more tuning phases and training, testing
approaches or they are not as effective as the other models because of their main
application use in classificatory terms more than forecasting.