I'm using GridsearchCV
for tuning hyperparameters and now I want to do a min-max Normalization(StandardScaler())
in training and validating step.But I think I cannot do this.
The question is :
- If I apply preprocess step on whole training set and send it to GridsearchCV for do 10 foldCV. This gonna lead me to data leakage right? because the training set will running 10 folds this mean 9 folds for train and 1 fold for test fold. the Normalization should apply on only training set not validation set right?
- If I use sklearn's Pipeline it won't solve this problem right? because it runs only once and lead me to data leakage again.
- Is there other way to do this and still using the
GridsearchCV
for tuning the parameters