1- If you do not want to read whole answer, just check the error rate with MASE when you are comparing different scaled operations. MSE is scale dependent whereas MASE is scale free. This explains why you are getting MSE low in Min-Max Normalization whereas your predictions seems to be better with Standard Normalization.
I tested code below with different error functions (MAE,MSE,MASE) and if you check the outputs,with min-max scaler, you will (almost) always get lower values. But that does not confirm (will explain later) that we did well with min-max scaler. By the way, i also observed that MAPE is not a good choice with min-max scaler and removed from the code.
import numpy as np
from sklearn.preprocessing import MinMaxScaler,StandardScaler
from sklearn.metrics import mean_squared_error,mean_absolute_error
np.random.seed(42)
y_true = np.random.randint(0,500,100)
y_pred = np.random.randint(0,500,100)
#scale
min_max_scaler = MinMaxScaler()
standard_scaler = StandardScaler()
y_true_min_max = min_max_scaler.fit_transform(y_true.reshape(-1,1))
y_pred_min_max = min_max_scaler.fit_transform(y_pred.reshape(-1,1))
y_true_standard = standard_scaler.fit_transform(y_true.reshape(-1,1))
y_pred_standard = standard_scaler.fit_transform(y_pred.reshape(-1,1))
def error_function(y_true,y_pred):
mae = mean_absolute_error(y_true,y_pred)
mse = mean_squared_error(y_true,y_pred)
def mean_absolute_scaled_error(y_true,y_pred):
return np.mean(np.abs(y_true - y_pred) / np.abs(y_true - np.mean(y_true)))
mase = mean_absolute_scaled_error(y_true,y_pred)
mase = round(mase, 2)
mae = round(mae, 2)
mse = round(mse, 2)
outputs = {'mae':mae,'mse':mse,'mase':mase}
return outputs
#error
min_max_error = error_function(y_true_min_max,y_pred_min_max)
standard_scaler_error = error_function(y_true_standard,y_pred_standard)
print(f"min_max_error: {min_max_error}",f"standard_scaler_error: {standard_scaler_error}",sep='')
Output: min_max_error: {'mae': 0.35, 'mse': 0.19, 'mase': 5.53}standard_scaler_error: {'mae': 1.2, 'mse': 2.16, 'mase': 5.57}
Now lets try with stock price assuming we have 2 outputs with min_max and standard scaler and i assume standard scaler performed well with the output. The question is, checking the MSE value is enough?
stock_price = np.array([50,123,100,213,123,22])
predicted_stock_price_min_max = np.array([123,100,213,123,22,55]) # assumming that predicted worse
predicted_stock_price_standard = np.array([40,113,90,113,129,33]) # assuming that predicted well
#scale
min_max_scaler = MinMaxScaler()
standard_scaler = StandardScaler()
stock_price_min_max = min_max_scaler.fit_transform(stock_price.reshape(-1,1))
predicted_stock_price_min_max = min_max_scaler.fit_transform(predicted_stock_price_min_max.reshape(-1,1))
stock_price_standard = standard_scaler.fit_transform(stock_price.reshape(-1,1))
predicted_stock_price_standard = standard_scaler.fit_transform(predicted_stock_price_standard.reshape(-1,1))
##error
min_max_error = error_function(stock_price_min_max,predicted_stock_price_min_max)
standard_scaler_error = error_function(stock_price_standard,predicted_stock_price_standard)
print(f"min_max_error: {min_max_error}",f"standard_scaler_error: {standard_scaler_error}",sep='')
Output:min_max_error: {'mae': 0.38, 'mse': 0.17, 'mase': 5.23} standard_scaler_error: {'mae': 0.49, 'mse': 0.36, 'mase': 1.26}
If you check the errors, standard scaled errors supposed to be lower, but wait, min-max still low (in your case check mse), except MASE. Other results does not confirm the result as we can observe that we performed better with standard scale.
As a result, MSE may not be a good accuracy result when performing time series forecast. From my point of view, MASE is more robust when performing Time Series Forecast.
ref: https://towardsdatascience.com/time-series-forecast-error-metrics-you-should-know-cc88b8c67f27
2- Explained here : https://stackoverflow.com/a/58850139/12906920 and https://www.atoti.io/articles/when-to-perform-a-feature-scaling/#:~:text=What%20is%20Feature%20Scaling%3F,during%20the%20data%20preprocessing%20step. Long story short, if your data follows, normal distribution, use standard scaler. If the data has lots of outliers use robustscaler, else, you can use min-max scaler (for example: for image pixel values/arrays)