By which technique adapted to time-series can I replace cross-validation in my Keras MLP regression model in Python

Question

I'm currently working with a time series dataset of 46 lines about meteorological measurements on approximately each 3 hours by day during one week. My explanatory variables (X) is composed of 26 variables and some variable has different units of measurement (degree, minimeters, g/m3 etc.). My variable to explain (y) is composed of only one variable temperature.

My goal is to predict temperature (y) on a slot of 12h-24h with the ensemble of variables (X)

For that I used Keras Tensorflow and Python, with MLP regressor model :

X = df_forcast_cap.loc[:, ~df_forcast_cap.columns.str.startswith('l')] 
X = X.drop(['temperature_Y'],axis=1)
y = df_forcast_cap['temperature_Y']
y = pd.DataFrame(data=y)

# normalize the dataset X
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit_transform(X)
normalized = scaler.transform(X)

# normalize the dataset y
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit_transform(y)
normalized = scaler.transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# define base model
def norm_model():
    # create model
    model = Sequential()
    model.add(Dense(26, input_dim=26, kernel_initializer='normal', activation='relu'))# 30 is then number of neurons
    #model.add(Dense(6, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))

    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# evaluate model with standardized dataset
estimator = KerasRegressor(build_fn=norm_model, epochs=(100), batch_size=5, verbose=1)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, y, cv=kfold)

print(results)

[-0.00454741 -0.00323181 -0.00345096 -0.00847261 -0.00390925 -0.00334816
 -0.00239754 -0.00681044 -0.02098541 -0.00140129]


# invert predictions
X_train = scaler.inverse_transform(X_train)
y_train = scaler.inverse_transform(y_train)
X_test = scaler.inverse_transform(X_test)
y_test = scaler.inverse_transform(y_test)
results = scaler.inverse_transform(results)

print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))
Results: -0.01 (0.01) MSE

(1) I read that cross-validation is not adapted for time series prediction. So, I'm wondering which others techniques exist and which one is more adapted to time-series.

(2) In a second place, I decided to normalize my data because my X dataset is composed of different metrics (degree, minimeters, g/m3 etc.) and my variable to explain y is in degree. In this way, I know that have to deal with a more complicated interpretation of the MSE because its result won't be in the same unity that my y variable. But for the next step of my study I need to save the result of the y predicted (made by the MLP model) and I need that these values be in degree. So, I tried to inverse the normalization but without success, when I print my results, the predicted values are still in normalized format (see in my code above). Does anyone see my mistake.s ?

score 1 · Accepted Answer · answered Jul 11 '19 at 11:45

The model that you present above is looking at a single instance of 26 measurements to make a prediction. From your description it seems that you would like to make predictions from a sequence of these measurements. I'm not sure if I fully understood the description but I'll assume that you have a sequence of 46 measurements, each with 26 values that you believe should be good predictors of the temperature. If that is the case, the input shape of your model should be (46, 26,). The 46 here is called time_steps, 26 is the number of features.

For a time series you need to select a model design. There are 2 approaches: a recurrent network or a convolutional network (or a mixture of the 2nd). A convolutional network is typically used to detect patterns in the input data which may be located somewhere in the data. For instance, suppose you want to detect a given shape in an image. Convolutional Networks are a good starting point. Recurrent networks, update their internal state after each time step. They can detect patterns as well as a convolutional network, but you can think of them as being less position independent.

Simple example of a convolutional approach.

import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential, Model

average_tmp = 0.0

model = Sequential([
    InputLayer(input_shape=(46,26,)),
    Conv1D(16, 4),
    Conv1D(32, 4),
    Conv1D(64, 2),
    Conv1D(128, 4),
    MaxPooling1D(),
    Flatten(),
    Dense(256, activation='relu'),
    Dense(1, bias_initializer=keras.initializers.Constant(average_tmp)),
])

model.compile('adam', 'mse')
model.summary()

A mixed approach, would replace the ```Flatten`` layer above with an LSTM node. That would probably be a reasonable starting point to start experimenting.

(1) I read that cross-validation is not adapted for time series prediction. So, I'm wondering which others techniques exist and which one is more adapted to time-series.

cross validation is a technique that is very well suited for this problem. If you try the example model above, I can almost guarantee that it will overfit your dataset very significantly. cross-validation can help you determine the right regularisation parameters for your model in order to avoid overfitting.

Examples of regularisation techniques that you probably want to consider:

Saving the model weights at the epoch with lower validation score.
Dropout and/or BatchNormalization.
kernel regularisation.

(2) In a second place, I decided to normalize my data because my X dataset is composed of different metrics (degree, minimeters, g/m3 etc.) and my variable to explain y is in degree.

Good call. It will avoid training cycles of your model trying to discover the bias at very high values from the random initialisation.

In this way, I know that have to deal with a more complicated interpretation of the MSE because its result won't be in the same unity that my y variable.

This is orthogonal. The inputs are not assumed to be in the same unit as y. We assume in a DNN that we can create a combination of linear transformation of weights (plus non-linear activations). That has no implicit assumption of units.

But for the next step of my study I need to save the result of the y predicted (made by the MLP model) and I need that these values be in degree. So, I tried to inverse the normalization but without success, when I print my results, the predicted values are still in normalized format (see in my code above). Does anyone see my mistake.s ?

scaler.inverse_transform(results) should do the trick. It doesn't make sense to inverse transform the inputs X_ and Y_. And it would probably help you keep your code straight to not use the same variable name for both the X and Y scalers.

It is also possible to refrain from scaling Y. If you choose to do so, I'd suggest that you initialise the output layer bias with the mean of the Ys.

Thanks for you detailed answer. Could you please detailed why my input_dim is not correct ? To constructed my model I followed the tutorial "Regression Tutorial with the Keras Deep Learning Library in Python" (https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/). In this tutorial the X contains 13 variables and so they use "model.add(Dense(13, input_dim=13,........". What could be wrong with that for a MLP Regression ? — JEG, Jul 12 '19 at 07:04
It really depends on how you formulate the problem. I interpreted your description above as saying that you would like to predict a variable (Temperature) based on multiple readings on 26 variables. Perhaps I misunderstood you. It helps if you can clearly define the problem independently of how to implement it: i.e. what does a prediction depend on ? If the output only depends on a single reading of 26 variables then the right input shape is (26,); if the output should depend on multiple readings then the input shape should (N, 26,) where N is the length of the sequence. — Pedro Marques, Jul 12 '19 at 12:16
In fact, I think that you correctly understood the problem. I would like to train my model to predict the temperature which will be based on the "X" matrix which contains variables like humidity, wind, cloud cover, and temperatures predicted by many weather forecasters. I also have a "y" array which contains the real observed temperatures. So with all theses data my goal is to (idealy) predict a temperature as close as possible to the real temperature, based on the informations data that I have (humidity and etc.) — JEG, Jul 15 '19 at 07:14
If each prediction is based on a matrix, then the shape of your inputs should be a matrix. You may want to try to follow the code example that I had in my original answer. If you do need to make sure that you generate the data correctly... i.e. for each y value that you know to be true you would want to generate the matrix X of values known before y that would lead you to that conclusion. — Pedro Marques, Jul 15 '19 at 08:57

By which technique adapted to time-series can I replace cross-validation in my Keras MLP regression model in Python

1 Answers1