Let’s use ARIMA to create a formula to predict tomorrow’s stock return (and why you shouldn’t)

Tony Vuolo
5 min readMar 15, 2023

--

Distribution of AAPL daily returns

This article differs from many you will see because I show you how to get the equation from the model and apply it to predict tomorrow’s stock return on apple stock.

First, we will fit an ARIMA model on Apple stock returns. An ARIMA model may not be the best because it only captures dependencies between time-lagged observations and does not directly capture changing variance or volatility clustering. We will cover it here so you understand how to model the ARIMA into an equation and use it for predictions, and at the end of the article, we will cover why you shouldn’t use it. It’s helpful to see how something works before you can understand why it doesn’t.

I searched ARIMA models over (6,3,6), which is 6x3x6 = 108 different ARIMA models. This can be done in any econometrics package, such as eviews, Stata, or in the programming languages R or Python (among many others). The best I found based on AIC for Apple stock returns for the period 1/1/2020 to 12/31/2022 was an ARIMA(2,0,5)

ARIMA(2,0,5) for Apple Stock Returns

Python to search for the best ARIMA over Apple Stock Returns

import itertools
from statsmodels.tsa.arima.model import ARIMA
import warnings
import time

start_time = time.time() # Record the start time

# Suppress all warnings
warnings.filterwarnings("ignore")

# Load and preprocess the data (assuming you've already done this)
# aapl_daily_returns = ...

# Define the range of p, d, and q values to test
p_range = range(0, 6) # AR terms
d_range = range(0, 3) # Differencing
q_range = range(0, 6) # MA terms

# Generate all combinations of p, d, and q values
pdq_combinations = list(itertools.product(p_range, d_range, q_range))

# Iterate over combinations, fit the models, and compare AIC values
best_aic = float("inf")
best_params = None
best_model = None

for pdq in pdq_combinations:
try:
model = ARIMA(aapl_daily_returns, order=pdq)
results = model.fit()

if results.aic < best_aic:
best_aic = results.aic
best_params = pdq
best_model = results

except ValueError:
continue

print(f"Best ARIMA model parameters: {best_params}, AIC: {best_aic:.4f}")
print(best_model.summary())

end_time = time.time() # Record the end time

elapsed_time = end_time - start_time
print(f"Time taken by the algorithm: {elapsed_time} seconds")

How do we use this? We need to use the formula to make predictions.
The formula for the ARIMA(2, 0, 5) model based on the provided coefficients is:

Y(t) = c + ϕ1 * Y(t-1) + ϕ2 * Y(t-2) + ε(t) + θ1 * ε(t-1) + θ2 * ε(t-2) + θ3 * ε(t-3) + θ4 * ε(t-4) + θ5 * ε(t-5)

Where:

  • Y(t) is the time series value at time t.
  • c is the constant term (0.0010).
  • ϕ1 and ϕ2 are the autoregressive coefficients (-1.7125 and -0.8584, respectively).
  • Y(t-1) and Y(t-2) are the time series values at t-1 and t-2, respectively.
  • ε(t) is the error term at time t.
  • θ1, θ2, θ3, θ4, and θ5 are the moving average coefficients (1.5829, 0.6410, -0.0774, -0.0003, and 0.0265, respectively).
  • ε(t-1), ε(t-2), ε(t-3), ε(t-4), and ε(t-5) are the error terms at times t-1, t-2, t-3, t-4, and t-5, respectively.
  • σ² is the variance of the error terms (0.0005).

We can use python to grab the coefficients from the model and predict tomorrow’s stock return and price.

Python to predict a future return.

import yfinance as yf
import datetime
import numpy as np
import pandas as pd
from statsmodels.stats.diagnostic import acorr_ljungbox

# Define the time range for the stock data
start_date = datetime.datetime(2020, 1, 1)
end_date = datetime.datetime(2022, 3, 14)

# Download AAPL stock data
ticker = "AAPL"
aapl_df = yf.download(ticker, start=start_date, end=end_date)

# Compute daily returns
aapl_df['Returns'] = aapl_df['Adj Close'].pct_change()
aapl_daily_returns = aapl_df['Returns'].dropna()


# Load and preprocess the data (assuming you've already done this)
# aapl_daily_returns = ...

# Fit the ARIMA(2, 0, 5) model
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(aapl_daily_returns, order=(2, 0, 5))
results = model.fit()

# Get the last two values of the time series
y_t1 = aapl_daily_returns.iloc[-1]
y_t2 = aapl_daily_returns.iloc[-2]

# Get the residuals (error terms)
residuals = results.resid

# Get the last five error terms
epsilon_t1 = residuals.iloc[-1]
epsilon_t2 = residuals.iloc[-2]
epsilon_t3 = residuals.iloc[-3]
epsilon_t4 = residuals.iloc[-4]
epsilon_t5 = residuals.iloc[-5]

# Define the ARIMA(2, 0, 5) model parameters from the results
c = results.params['const']
phi1 = results.params['ar.L1']
phi2 = results.params['ar.L2']
theta1 = results.params['ma.L1']
theta2 = results.params['ma.L2']
theta3 = results.params['ma.L3']
theta4 = results.params['ma.L4']
theta5 = results.params['ma.L5']

# Predict tomorrow's value using the formula
y_tomorrow = c + phi1 * y_t1 + phi2 * y_t2 + theta1 * epsilon_t1 + theta2 * epsilon_t2 + theta3 * epsilon_t3 + theta4 * epsilon_t4 + theta5 * epsilon_t5

print("Tomorrow's predicted value:", y_tomorrow)

You can convert the return to a stock price as follows:

price_today = aapl_df["Close"].iloc[-1]  # Use the last closing price from the DataFrame aapl_df
price_tomorrow = price_today * (1 + y_tomorrow)
print("Tomorrow's predicted stock price:", price_tomorrow)

Running the above code will give the predicted return and stock price for 3/15 . Every time you run this, it will give you a different result. You get a different answer every time you run the code because the ARIMA model is re-fit each time you execute the code. When fitting an ARIMA model, it uses a numerical optimization algorithm to find the best parameters. The optimization process is not deterministic, meaning it can give slightly different results each time it runs. The differences are usually minor but can lead to different predictions.

I do not recommend any trading strategy based on running an ARIMA once and outputting a prediction. Between the drift in predictions due to the non-deterministic optimization and lack of capturing changing variance or volatility clustering in the returns, we just will not get usable results. Why cover this approach and point this out, then? Knowing when not to use a model is as valuable as knowing when to use it. Even though we aren’t going to use the ARIMA model directly to predict the stock price, it can be a good building block to understanding the pattern in returns. In a future article, we will use and build upon what we learned here.

--

--

Tony Vuolo
Tony Vuolo

Written by Tony Vuolo

I have a passion for quantitative finance, economics, and computer science. https://www.linkedin.com/in/tonyvuolo/

No responses yet