Keras RNN univariate multi-steap-ahed, t+2 has better performance than t+1, input/timestep/RNN structure problem

Question

I have a problem when using RNN and want to predict multiple steps ahead. The code is 'working', but the output does not make sense, basically the t+2 is a lot more accurate than t+1 and the same goes for t+3, and it is very counterintuitive that the one-step-ahead output should be significantly less accurate. The data setup is as follows; We want to predict the total sales (across multiple platforms) for a given hour. The total sales data has a lack, so we do not have it continuously. The sales on the internal platform are real-time, so no delay on this data. Lastly, we have an external forecast, however, the forecast is static and is not revised very often, but we have it far into the future. The forecasting problem is, we want to predict the next 4 hours of total sales. However, because the total sale data is delayed we already know, what our internal sales are for the first hour, and we also have the external forecast for all 4 hours. How do incorporate this into my model? Is my current setup the right method for this?

The input variables has shape (Samples, TimeStept, Features), where the first column in X_array[:,0,:] correspond to the observations that is first in the sequence, so it is the oldest part and similar for Y_train --> Y_train[TargetNames].columns = ['t + 1', 't + 2', 't + 3', 't + 4']

The code below is a simulation of the problem, meaning that the second element of AE/MAPE should be less than the first element --> t+2 has a lower error than t+1:

from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import math
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras


import datetime as dt

pd.options.mode.chained_assignment = None  # default='warn'

Make data

df = pd.date_range('2021-01-01', '2021-12-01', freq='H')
df = pd.DataFrame(df[0:len(df)-1], columns={'DateTime'})
df['TotalSales'] = 0
np.random.seed(1)
for i in df.index:

    if i == df.index[0]:
        df['TotalSales'].iloc[i] = 1
    else:
        x = df['TotalSales'].iloc[i-1] + np.random.normal(0, 1, 1)
        if x < 0:
            df['TotalSales'].iloc[i] = 1 + 0.1 * math.exp(x)
        else:
            df['TotalSales'].iloc[i] = 1 + 0.9 * x

df['InternalSales'] = 0.2 * df['TotalSales'] + np.random.normal(0, 0.2, len(df))
df['ExternalForecast'] = df['TotalSales'] + np.random.normal(0, 2, len(df))
df['ExternalForecast'][df['ExternalForecast']<0] = 0.1

df['InternalSales'].iloc[len(df)-3:] = np.nan # We do not know these observations
df['TotalSales'].iloc[len(df)-4:] = np.nan # We do not know these observations

df.set_index('DateTime', inplace=True)

df.tail()

Align data

df['InternalSales_Lead1'] = df['InternalSales'].shift(-1)
df['ExternalForecast_Lead2'] = df['ExternalForecast'].shift(-4) # typo df['ExternalForecast_Lead4'] =..

pd.set_option('display.max_columns', 5)
df.tail()

Setting

valid_start = '2021-10-01'
test_start = '2021-11-01'
Gran = 60 # minutes

Names = ['InternalSales_Lead1', 'ExternalForecast_Lead2']
Target = 'TotalSales'
AlternativeForecast = 'ExternalForecast'


TimeSteps =  24 # hours
HORIZON = 4 # step ahead

X_array = df.copy()
X_array = X_array[Names]

df.reset_index(inplace=True)

Data = df[df['DateTime'].dt.date.astype(str) < test_start]

scaler = StandardScaler().fit(Data[Names])
yScaler = MinMaxScaler().fit(np.array(Data[Target]).reshape(-1, 1))

df['Scaled_' + Target] = yScaler.transform(np.array(df[Target]).reshape(-1, 1))

X_array = pd.DataFrame(scaler.transform(X_array), index=X_array.index,columns=X_array.columns)


def LSTM_structure(Y, X, timestep, horizon, TargetName):

    if TargetName==None:
        print('TargetName must be specified')


    Array_X = np.zeros(((len(X) - timestep + 1), timestep, len(X.columns)))

    for variable in range(0,len(X.columns)):
        col = X.columns[variable]

        for t in range(timestep, len(X)+1):
            # Array_X[t - timestep,:,variable] = np.array(X[col].iloc[(t - timestep):t]).T
            Array_X[t - timestep, :, variable] = X[col].iloc[(t - timestep):t].values


    if horizon ==1:
        Y_LSTM = Y[(timestep - 1):]
        Y_LSTM['t'+str(horizon)] = Y_LSTM[TargetName]

    else:
        Y_LSTM = Y[(timestep - 1):]
        for t in range(1,horizon+1):
            Y_LSTM['t + ' + str(t)] = Y_LSTM[TargetName].shift(-(t-1))


    return Y_LSTM, Array_X

Y_total, X_array = LSTM_structure(Y=df[['DateTime', Target, 'Scaled_' + Target, AlternativeForecast]], X=X_array, timestep=TimeSteps, horizon=HORIZON, TargetName='Scaled_' + Target)
# X_array.shape = (7993, 24, 2)

Y_total.reset_index(drop=True, inplace=True)

Y_train = Y_total[Y_total['DateTime'].dt.date.astype(str) < valid_start]

X_train_scale = X_array[Y_train.index,:,:]

Y_Val = Y_total[(Y_total['DateTime'].dt.date.astype(str) >= valid_start) & (Y_total['DateTime'].dt.date.astype(str) < test_start)]

X_val_scale = X_array[Y_Val.index,:,:]

Y_test = Y_total[Y_total['DateTime'].dt.date.astype(str) >= test_start]

X_test_scale = X_array[Y_test.index,:,:]

Model

TargetNames = Y_total.filter(like='t + ').columns

LATENT_DIM = 5
BATCH_SIZE = 32
EPOCHS = 10

try:
    del model
except Exception:
    pass

model = keras.Sequential()
model.add(keras.layers.GRU(LATENT_DIM, input_shape=(TimeSteps, X_train_scale.shape[2])))
model.add(keras.layers.RepeatVector(HORIZON))
model.add(keras.layers.GRU(LATENT_DIM, return_sequences=True))
model.add(keras.layers.TimeDistributed(keras.layers.Dense(1)))
model.add(keras.layers.Flatten())

model.compile(optimizer='SGD', loss='mse')

model.summary()

earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, restore_best_weights=True)

i = 1
np.random.seed(i)
tf.random.set_seed(i)

hist = model.fit(X_train_scale,
              Y_train[TargetNames],
              batch_size=BATCH_SIZE,
              epochs=EPOCHS,
              validation_data=(X_val_scale, Y_Val[TargetNames]),
              callbacks=[earlystop],
              verbose=1)

y_hat_scaled = model.predict(X_test_scale)


for i in range(1, HORIZON+1):
    Y_test['Predict_t + ' + str(i)] = yScaler.inverse_transform(np.array(y_hat_scaled[:,i-1]).reshape(-1, 1))

Make format correct

for i in range(1, HORIZON + 1):

    if i == 1:
        Performance = Y_test[['DateTime', Target, AlternativeForecast,'Predict_t + '+ str(i)]]

    else:
        Temp = Y_test[['DateTime', 'Predict_t + '+ str(i)]]
        Temp['DateTime'] = Temp['DateTime'] + dt.timedelta(minutes=Gran * (i-1))

        Performance = pd.merge(Performance, Temp[['DateTime', 'Predict_t + '+ str(i)]], how='left', on='DateTime')

Plot

from matplotlib import pyplot as plt
plt.plot(Performance['DateTime'], Performance[Target], label=Target)
for i in range(1, HORIZON + 1):
    plt.plot(Performance['DateTime'], Performance['Predict_t + '+ str(i)], label='Predict_t + '+ str(i))

plt.title('Model Performance')
plt.ylabel('MW')
plt.xlabel('Time')
plt.legend()
plt.show()

Performance

for i in range(1, HORIZON + 1):
    ae= (Performance['Predict_t + '+ str(i)] - Performance[Target]).abs().mean()
    mape = ((Performance['Predict_t + '+ str(i)] - Performance[Target]).abs()/Performance[Target]).mean() * 100

    if i == 1:
        AE= ae
        MAPE = round(mape,2)
    else:
        AE= np.append(AE, ae)
        MAPE = np.append(MAPE, round(mape,2))

# Alternative forecast
ae = (Performance[AlternativeForecast] - Performance[Target]).abs().mean()
mape = ((Performance[AlternativeForecast] - Performance[Target]).abs()/Performance[Target]).mean() * 100

AE= np.append(AE, ae)
MAPE = np.append(MAPE, round(mape, 2))


AE
MAPE

I hope one of you have time to help me with this problem of mine :-)