Running Kalman filter on multiple variables in Python

Question

I would like to predict closing price ('Close') of the stock using multiple variables (Open, Low, High, Volume, Close) by plugging into the Kalman filter. However, I get error that 'The shape of all parameters is not consistent. Please re-check their values.' I am not sure what I am doing wrong.

To be clear: I don't want five different stock predictions based on each variable but rather one price based on all 5 variables.

Here is the code:


import numpy as np
import yfinance as yf
from pykalman import KalmanFilter

spy_data = yf.download('SPY', start='2010-01-01', end='2023-03-17')

price = spy_data['Close'].values.reshape(-1, 1)
open_price = spy_data['Open'].values.reshape(-1, 1)
low_price = spy_data['Low'].values.reshape(-1, 1)
high_price = spy_data['High'].values.reshape(-1, 1)
volume = spy_data['Volume'].values.reshape(-1, 1)

initial_state = np.zeros(5)
initial_covariance = np.diag([100, 100, 100, 100, 100])
transition_matrix = np.array([[1, 0, 0, 0, 0], 
                              [0, 1, 0, 0, 0],
                              [0, 0, 1, 0, 0],
                              [0, 0, 0, 1, 0],
                              [0, 0, 0, 0, 1]])
observation_matrix = np.array([[1, 0, 0, 0, 0], 
                               [0, 0, 0, 0, 0],
                               [0, 0, 0, 0, 0],
                               [0, 0, 0, 0, 0],
                               [0, 0, 0, 0, 0]])

process_noise = np.diag([0.001, 0.001, 0.001, 0.001, 0.001])
observation_noise = np.diag([0.1])

kf = KalmanFilter(
    initial_state_mean=initial_state,
    initial_state_covariance=initial_covariance,
    transition_matrices=transition_matrix,
    observation_matrices=observation_matrix,
    observation_covariance=observation_noise,
    transition_covariance=process_noise)

state_means, state_covariances = kf.filter(np.hstack([price, open_price, low_price, high_price, volume]))

# Predict next day's closing price
last_state_mean = state_means[-1]
last_state_covariance = state_covariances[-1]

next_state_mean, next_state_covariance = kf.filter_update(
last_state_mean, last_state_covariance, observation= np.array([spy_data['Adj Close'][-1], spy_data['Open'][-1], spy_data['Low'][-1], spy_data['High'][-1], spy_data['Volume'][-1]])
)
predicted_price = next_state_mean[0]


print(f"Today's SPY closing price: {price[-1][0]}")
print(f"Predicted SPY closing price for tomorrow: {predicted_price}")

I unintentionally left out a few variables (initial_state_covariance, observation_covariance, ...) but I edited my answer and included everything. You can accept my answer if it works for you, thanks! — Ori Yarden PhD, Mar 17 '23 at 22:14
thanks for looking into this. is there any difference between observation_noise = np.diag([0.1, 0.1, 0.1, 0.1, 0.1]) & observation_noise = [[0.1, 0.0, 0.0, 0.0, 0.0], [0.0, 0.1, 0.0, 0.0, 0.0], [0.0, 0.0, 0.1, 0.0, 0.0], [0.0, 0.0, 0.0, 0.1, 0.0], [0.0, 0.0, 0.0, 0.0, 0.1]] — dataman, Mar 18 '23 at 09:55
Nope, but the first is a numpy array and the second is a nested list. — Ori Yarden PhD, Mar 18 '23 at 14:18
Interestingly, I ran the code with the numpy array and nested list and slightly I got two different predictions: 396.4538217822543 (with array) and 396.3105625117946 (with nested list) — dataman, Mar 18 '23 at 21:16
it was because I ran the code before/during market open. I just re-ran both and they generate the same output: 396.3105625117946. — Ori Yarden PhD, Mar 19 '23 at 15:36

Ori Yarden PhD · Answer 1 · 2023-03-29T18:11:26.053

It's the shapes of your inputs, and unfortunately pykalman's KalmanFilter does not specify details like which shapes are raising the error, what the shapes are, etc. The issue is observation_noise's shape, which should be np.diag([0.1, 0.1, 0.1, 0.1, 0.1]):

spy_data = yf.download('SPY', start='2010-01-01', end='2023-03-17')

price = spy_data['Close'].values.reshape(-1, 1)
open_price = spy_data['Open'].values.reshape(-1, 1)
low_price = spy_data['Low'].values.reshape(-1, 1)
high_price = spy_data['High'].values.reshape(-1, 1)
volume = spy_data['Volume'].values.reshape(-1, 1)

initial_state = np.zeros(5)
initial_covariance = np.diag([100, 100, 100, 100, 100])
transition_matrix = np.array([[1, 0, 0, 0, 0], 
                              [0, 1, 0, 0, 0],
                              [0, 0, 1, 0, 0],
                              [0, 0, 0, 1, 0],
                              [0, 0, 0, 0, 1]])
observation_matrix = np.array([[1, 0, 0, 0, 0], 
                               [0, 0, 0, 0, 0],
                               [0, 0, 0, 0, 0],
                               [0, 0, 0, 0, 0],
                               [0, 0, 0, 0, 0]])

process_noise = np.diag([0.001, 0.001, 0.001, 0.001, 0.001])
observation_noise = np.diag([0.1, 0.1, 0.1, 0.1, 0.1])

kf = KalmanFilter(
    initial_state_mean=initial_state,
    initial_state_covariance=initial_covariance,
    transition_matrices=transition_matrix,
    observation_matrices=observation_matrix,
    observation_covariance=observation_noise,
    transition_covariance=process_noise)

state_means, state_covariances = kf.filter(np.hstack([price, open_price, low_price, high_price, volume]))

# Predict next day's closing price
last_state_mean = state_means[-1]
last_state_covariance = state_covariances[-1]

next_state_mean, next_state_covariance = kf.filter_update(
last_state_mean, last_state_covariance, observation= np.array([spy_data['Adj Close'][-1], spy_data['Open'][-1], spy_data['Low'][-1], spy_data['High'][-1], spy_data['Volume'][-1]])
)
predicted_price = next_state_mean[0]


print(f"Today's SPY closing price: {price[-1][0]}")
print(f"Predicted SPY closing price for tomorrow: {predicted_price}")

Outputs:

[*********************100%***********************]  1 of 1 completed
Today's SPY closing price: 396.1099853515625
Predicted SPY closing price for tomorrow: 396.3105625117946

Running Kalman filter on multiple variables in Python

1 Answers1