Suppose I have this dataframe (in a regression problem) with numerical
and categorical
data:
df_example
Var1_numerical Var2_categorical Var3_numerical Var4_categorical Var_to_predict
20 red 1 BK 352352
10 blue 4 BL 345341
5 orange 6 BA 423423
1 red 3 BK 342342
90 orange 2 BK 456456
So, in one part of the process I will use RobustScaler()
on the numeric variables and OneHotEncoder()
on the categorical variables so that the model can learn from these variables. And now I will have my model trained to predict with a certain error for that prediction.
The interesting thing is to predict on new data using model.predict()
pred_list_example=[15, red, 1, BK]
a = np.array(pred_list)
a = np.expand_dims(a, 0)
model.predict(a)
Question 1: Do I need to use RobustScaler()
and OneHotEncoder()
on pred_list_example
before using model.predict(a)
?
Question 2: In case the answer to the previous question is "yes", the Var_to_predict
will be scaled due to RobustScaler()
. Do I need to use RobustScaler().inverse_transform
to get the original numeric value of the prediction?