I'm currently working with a time series dataset of 46 lines about meteorological measurements on approximately each 3 hours by day during one week. My explanatory variables (X) is composed of 26 variables and some variable has different units of measurement (degree, minimeters, g/m3 etc.). My variable to explain (y) is composed of only one variable temperature.
My goal is to predict temperature (y) on a slot of 12h-24h with the ensemble of variables (X)
For that I used Keras Tensorflow and Python, with MLP regressor model :
X = df_forcast_cap.loc[:, ~df_forcast_cap.columns.str.startswith('l')]
X = X.drop(['temperature_Y'],axis=1)
y = df_forcast_cap['temperature_Y']
y = pd.DataFrame(data=y)
# normalize the dataset X
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit_transform(X)
normalized = scaler.transform(X)
# normalize the dataset y
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit_transform(y)
normalized = scaler.transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# define base model
def norm_model():
# create model
model = Sequential()
model.add(Dense(26, input_dim=26, kernel_initializer='normal', activation='relu'))# 30 is then number of neurons
#model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# evaluate model with standardized dataset
estimator = KerasRegressor(build_fn=norm_model, epochs=(100), batch_size=5, verbose=1)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, y, cv=kfold)
print(results)
[-0.00454741 -0.00323181 -0.00345096 -0.00847261 -0.00390925 -0.00334816
-0.00239754 -0.00681044 -0.02098541 -0.00140129]
# invert predictions
X_train = scaler.inverse_transform(X_train)
y_train = scaler.inverse_transform(y_train)
X_test = scaler.inverse_transform(X_test)
y_test = scaler.inverse_transform(y_test)
results = scaler.inverse_transform(results)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))
Results: -0.01 (0.01) MSE
(1) I read that cross-validation is not adapted for time series prediction. So, I'm wondering which others techniques exist and which one is more adapted to time-series.
(2) In a second place, I decided to normalize my data because my X dataset is composed of different metrics (degree, minimeters, g/m3 etc.) and my variable to explain y is in degree. In this way, I know that have to deal with a more complicated interpretation of the MSE because its result won't be in the same unity that my y variable. But for the next step of my study I need to save the result of the y predicted (made by the MLP model) and I need that these values be in degree. So, I tried to inverse the normalization but without success, when I print my results, the predicted values are still in normalized format (see in my code above). Does anyone see my mistake.s ?