I have a problem when using RNN and want to predict multiple steps ahead. The code is 'working', but the output does not make sense, basically the t+2 is a lot more accurate than t+1 and the same goes for t+3, and it is very counterintuitive that the one-step-ahead output should be significantly less accurate. The data setup is as follows; We want to predict the total sales (across multiple platforms) for a given hour. The total sales data has a lack, so we do not have it continuously. The sales on the internal platform are real-time, so no delay on this data. Lastly, we have an external forecast, however, the forecast is static and is not revised very often, but we have it far into the future. The forecasting problem is, we want to predict the next 4 hours of total sales. However, because the total sale data is delayed we already know, what our internal sales are for the first hour, and we also have the external forecast for all 4 hours. How do incorporate this into my model? Is my current setup the right method for this?
The input variables has shape (Samples, TimeStept, Features), where the first column in X_array[:,0,:] correspond to the observations that is first in the sequence, so it is the oldest part and similar for Y_train --> Y_train[TargetNames].columns = ['t + 1', 't + 2', 't + 3', 't + 4']
The code below is a simulation of the problem, meaning that the second element of AE/MAPE should be less than the first element --> t+2 has a lower error than t+1:
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import math
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
import datetime as dt
pd.options.mode.chained_assignment = None # default='warn'
Make data
df = pd.date_range('2021-01-01', '2021-12-01', freq='H')
df = pd.DataFrame(df[0:len(df)-1], columns={'DateTime'})
df['TotalSales'] = 0
np.random.seed(1)
for i in df.index:
if i == df.index[0]:
df['TotalSales'].iloc[i] = 1
else:
x = df['TotalSales'].iloc[i-1] + np.random.normal(0, 1, 1)
if x < 0:
df['TotalSales'].iloc[i] = 1 + 0.1 * math.exp(x)
else:
df['TotalSales'].iloc[i] = 1 + 0.9 * x
df['InternalSales'] = 0.2 * df['TotalSales'] + np.random.normal(0, 0.2, len(df))
df['ExternalForecast'] = df['TotalSales'] + np.random.normal(0, 2, len(df))
df['ExternalForecast'][df['ExternalForecast']<0] = 0.1
df['InternalSales'].iloc[len(df)-3:] = np.nan # We do not know these observations
df['TotalSales'].iloc[len(df)-4:] = np.nan # We do not know these observations
df.set_index('DateTime', inplace=True)
df.tail()
Align data
df['InternalSales_Lead1'] = df['InternalSales'].shift(-1)
df['ExternalForecast_Lead2'] = df['ExternalForecast'].shift(-4) # typo df['ExternalForecast_Lead4'] =..
pd.set_option('display.max_columns', 5)
df.tail()
Setting
valid_start = '2021-10-01'
test_start = '2021-11-01'
Gran = 60 # minutes
Names = ['InternalSales_Lead1', 'ExternalForecast_Lead2']
Target = 'TotalSales'
AlternativeForecast = 'ExternalForecast'
TimeSteps = 24 # hours
HORIZON = 4 # step ahead
X_array = df.copy()
X_array = X_array[Names]
df.reset_index(inplace=True)
Data = df[df['DateTime'].dt.date.astype(str) < test_start]
scaler = StandardScaler().fit(Data[Names])
yScaler = MinMaxScaler().fit(np.array(Data[Target]).reshape(-1, 1))
df['Scaled_' + Target] = yScaler.transform(np.array(df[Target]).reshape(-1, 1))
X_array = pd.DataFrame(scaler.transform(X_array), index=X_array.index,columns=X_array.columns)
def LSTM_structure(Y, X, timestep, horizon, TargetName):
if TargetName==None:
print('TargetName must be specified')
Array_X = np.zeros(((len(X) - timestep + 1), timestep, len(X.columns)))
for variable in range(0,len(X.columns)):
col = X.columns[variable]
for t in range(timestep, len(X)+1):
# Array_X[t - timestep,:,variable] = np.array(X[col].iloc[(t - timestep):t]).T
Array_X[t - timestep, :, variable] = X[col].iloc[(t - timestep):t].values
if horizon ==1:
Y_LSTM = Y[(timestep - 1):]
Y_LSTM['t'+str(horizon)] = Y_LSTM[TargetName]
else:
Y_LSTM = Y[(timestep - 1):]
for t in range(1,horizon+1):
Y_LSTM['t + ' + str(t)] = Y_LSTM[TargetName].shift(-(t-1))
return Y_LSTM, Array_X
Y_total, X_array = LSTM_structure(Y=df[['DateTime', Target, 'Scaled_' + Target, AlternativeForecast]], X=X_array, timestep=TimeSteps, horizon=HORIZON, TargetName='Scaled_' + Target)
# X_array.shape = (7993, 24, 2)
Y_total.reset_index(drop=True, inplace=True)
Y_train = Y_total[Y_total['DateTime'].dt.date.astype(str) < valid_start]
X_train_scale = X_array[Y_train.index,:,:]
Y_Val = Y_total[(Y_total['DateTime'].dt.date.astype(str) >= valid_start) & (Y_total['DateTime'].dt.date.astype(str) < test_start)]
X_val_scale = X_array[Y_Val.index,:,:]
Y_test = Y_total[Y_total['DateTime'].dt.date.astype(str) >= test_start]
X_test_scale = X_array[Y_test.index,:,:]
Model
TargetNames = Y_total.filter(like='t + ').columns
LATENT_DIM = 5
BATCH_SIZE = 32
EPOCHS = 10
try:
del model
except Exception:
pass
model = keras.Sequential()
model.add(keras.layers.GRU(LATENT_DIM, input_shape=(TimeSteps, X_train_scale.shape[2])))
model.add(keras.layers.RepeatVector(HORIZON))
model.add(keras.layers.GRU(LATENT_DIM, return_sequences=True))
model.add(keras.layers.TimeDistributed(keras.layers.Dense(1)))
model.add(keras.layers.Flatten())
model.compile(optimizer='SGD', loss='mse')
model.summary()
earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, restore_best_weights=True)
i = 1
np.random.seed(i)
tf.random.set_seed(i)
hist = model.fit(X_train_scale,
Y_train[TargetNames],
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(X_val_scale, Y_Val[TargetNames]),
callbacks=[earlystop],
verbose=1)
y_hat_scaled = model.predict(X_test_scale)
for i in range(1, HORIZON+1):
Y_test['Predict_t + ' + str(i)] = yScaler.inverse_transform(np.array(y_hat_scaled[:,i-1]).reshape(-1, 1))
Make format correct
for i in range(1, HORIZON + 1):
if i == 1:
Performance = Y_test[['DateTime', Target, AlternativeForecast,'Predict_t + '+ str(i)]]
else:
Temp = Y_test[['DateTime', 'Predict_t + '+ str(i)]]
Temp['DateTime'] = Temp['DateTime'] + dt.timedelta(minutes=Gran * (i-1))
Performance = pd.merge(Performance, Temp[['DateTime', 'Predict_t + '+ str(i)]], how='left', on='DateTime')
Plot
from matplotlib import pyplot as plt
plt.plot(Performance['DateTime'], Performance[Target], label=Target)
for i in range(1, HORIZON + 1):
plt.plot(Performance['DateTime'], Performance['Predict_t + '+ str(i)], label='Predict_t + '+ str(i))
plt.title('Model Performance')
plt.ylabel('MW')
plt.xlabel('Time')
plt.legend()
plt.show()
Performance
for i in range(1, HORIZON + 1):
ae= (Performance['Predict_t + '+ str(i)] - Performance[Target]).abs().mean()
mape = ((Performance['Predict_t + '+ str(i)] - Performance[Target]).abs()/Performance[Target]).mean() * 100
if i == 1:
AE= ae
MAPE = round(mape,2)
else:
AE= np.append(AE, ae)
MAPE = np.append(MAPE, round(mape,2))
# Alternative forecast
ae = (Performance[AlternativeForecast] - Performance[Target]).abs().mean()
mape = ((Performance[AlternativeForecast] - Performance[Target]).abs()/Performance[Target]).mean() * 100
AE= np.append(AE, ae)
MAPE = np.append(MAPE, round(mape, 2))
AE
MAPE
I hope one of you have time to help me with this problem of mine :-)