It is a question more about theory than a problem in code itself. I have the following Pipeline, which will then be used in a GridSearchCV:
my_model = Pipeline([('scaler', MinMaxScaler()), ('model', model())])
cv = GridSearchCV(my_model , parameters, cv=5).fit(X_train, Y_train)
Then, I will use the trained cv with the best hyperparameters to predict on the test set:
cv.predict(X_test)
My questions are as follows:
- Will GridSearchCV automatically apply the scaler only to the training set for each fold? That is, follow this logic for each fold:
scaler fit and transform on the train_set_fold (using, of course, the data only for the training set in question) -> train the model -> apply scaler transform on the test_set_fold
- When calling cv.predict, will GridSearchCV automatically apply the scaler.transform (learned previously on one of the folds with just the training set) to the set X_test before making the prediction itself?