Autoregressive Integrated Moving Average (ARIMA)

MoneyBestPal Team
An extension of the Autoregressive Moving Average (ARMA) model, which combines the simpler models of moving average (MA) and Autoregression (AR).
Image: Moneybestpal.com

Autoregressive Integrated Moving Average (ARIMA) is a powerful statistical tool for time series analysis and forecasting. It can accurately estimate the future based on the previous values and capture the patterns, trends, and seasonality of the data.


Autoregressive Integrated Moving Average is referred to as ARIMA. It is an extension of the Autoregressive Moving Average (ARMA) model, which combines the simpler models of moving average (MA) and Autoregression (AR).

The term Autoregression (AR) denotes a relationship between a variable's present value and its historical values, subject to some random mistake. For example, an AR(1) model can be written as:


y_t = c + phi * y_(t-1) + e_t


where y_t is the current value, y_(t-1) is the previous value, c is a constant, phi is a coefficient, and e_t is a random error term.

Moving Average (MA) means that the current value of the variable depends on the past errors, with some random error. For example, an MA(1) model can be written as:


y_t = c + e_t + theta * e_(t-1)


where y_t is the current value, c is a constant, e_t is a random error term, theta is a coefficient, and e_(t-1) is the previous error term.

ARMA combines both AR and MA models by adding their terms. For example, an ARMA(1,1) model can be written as:


y_t = c + phi * y_(t-1) + e_t + theta * e_(t-1)


where y_t is the current value, c is a constant, phi and theta are coefficients, e_t is a random error term, y_(t-1) is the previous value, and e_(t-1) is the previous error term.

The limitation of ARMA models is that they can only handle stationary time series. A stationary time series has a mean, variance, and autocorrelation that remain constant over time. Its statistical characteristics remain constant throughout time, in other words. Many time series in the real world, such as those with patterns or seasonality, are not stationary. For instance, a product's monthly sales may rise or fall over time depending on demand, or they may change based on seasonal factors like holidays or the weather.

ARIMA adds Integration (I) as a new component to deal with non-stationary time series. Integration refers to the process of diffusing the time series to eliminate non-stationarity. Differencing is the process of deducting the present value from the past worth. For example, if we have a time series y_t, we can difference it once to get:


delta y_t = y_t - y_(t-1)


where delta y_t is the first difference of y_t. We can difference it again to get:


delta^2 y_t = delta y_t - delta y_(t-1)


where delta^2 y_t is the second difference of y_t. And so on.

The number of times we differentiate the time series to make it stationary is known as the degree of differencing (d). For instance, if a time series contains a linear trend, the trend can be eliminated by differentiating it once (d=1). If a time series exhibits a quadratic trend, the tendency can be eliminated by differentiating the time series twice (d=2).

ARIMA combines ARMA and Integration by applying ARMA to different time series. For example, an ARIMA(1,1,1) model can be written as:


delta y_t = c + phi * delta y_(t-1) + e_t + theta * e_(t-1)


where delta y_t is the first difference of y_t, c is a constant, phi and theta are coefficients, e_t is a random error term, delta y_(t-1) is the previous difference of y_t, and e_(t-1) is the previous error term.

The standard way to refer to ARIMA models is with the notation ARIMA(p,d,q), where 'p' denotes the order of AR, 'd' is the degree of differencing, and 'q' is the order of MA. For instance, ARIMA(0,1,0) denotes the absence of both AR and MA terms (only differencing), ARIMA(0,0,1) denotes the absence of both AR and MA terms (just MA), and ARIMA(2,0,2) denotes the presence of both AR and MA terms (but not differencing).

To use ARIMA for forecasting, we need to follow these steps:
  1. Examine the time series plot, the autocorrelation function (ACF), and the partial autocorrelation function (PACF) to verify the values of p, d, and q.
  2. Use a technique like maximum likelihood estimation (MLE) or least squares estimation (LSE) to estimate the ARIMA model's parameters.
  3. Diagnostic techniques like the Akaike information criterion (AIC), the Bayesian information criterion (BIC), or the Ljung-Box test can be used to assess the ARIMA model's quality of fit.
  4. Using the parameters of the fitted ARIMA model, predict future values for the time series.

For time series analysis and forecasting, ARIMA is a versatile and often employed model. It can handle a range of time series data types, including those that have patterns, seasonality, or cycles. Exogenous variables and seasonal terms are two more elements that can be added to it. ARIMA does have certain drawbacks, though, including the need for a lot of data, the assumption that past and future values are linearly related, and sensitivity to outliers or structural changes.
Tags