MSE, is it correct to use evaluate with unscaled data in Tensorflow Keras?

Question

Suppose you have numerical time series data and you managed to split it like:

X_train, y_train, X_val, y_val, X_test, y_test.

and you properly scaled everything ending up with:

X_train_scaled, y_train_scaled, X_val_scaled, y_val_scaled, X_test_scaled, y_test_scaled

And now you run the following code:

linear = Sequential([
    Dense(units=1,activation='linear',input_shape=[X_train_scaled.shape[1])
    ])

linear.compile(loss='mse',optimizer='adam')

history = linear.fit(X_train_scaled, y_train_scaled,
                     epochs=50, verbose=1, shuffle=False,
                      validation_data=(X_valid_scaled.values,y_valid_scaled.values))

If our idea is to calculate the MSE, we can use the scaled test data and calculate it by 2 "different" ways:

mse_linear_scaled_1   = linear.evaluate(X_test_scaled,y_test_scaled)

or using the standalone version from https://www.tensorflow.org/api_docs/python/tf/keras/losses/MeanSquaredError

mse                   = keras.losses.MeanSquaredError()
mse_linear_scaled_2   = mse(y_test_scaled.values,y_pred_scaled).numpy()

if you do this exercise, mse_linear_scaled_1 = mse_linear_scaled_2 (as expected).

Now here comes the question (thank you if you read down to here...). If you do this same last part but with the original scale of the test data (the final idea is to get the RMSE value to have it in context of the real data) the results are very different between each other.

mse_linear_unscaled_1 = linear.evaluate(X_test,y_test)

gives a very different number than doing

mse_linear_unscaled_2 = mse(y_test,y_pred).numpy()

If I want to get the correct RMSE number in the scale of the original time series numbers, would guess this should be the correct way of doing it?

np.sqrt(mse_linear_unscaled_2)

Maybe .evaluate() wasn't thought for this and is doing something under the hood that I'm not aware, so it won't return the correct number?

score 0 · Accepted Answer · edited Apr 29 '21 at 01:00

When you do linear.evaluate(,) you are using the model linear that was already fitted with scaled data. So evaluating with unscaled data is like introducing a range of data that that particular model did not see.

The way to go is, in pseudocode:

y_pred_scaled = linear.predict(y_test_scaled) 
inverse_transform y_pred_scaled with your scaler 
mse in original scale comparing y_test to y_pred

MSE, is it correct to use evaluate with unscaled data in Tensorflow Keras?

1 Answers1