I am dealing with a binary classification problem.
I have 2 lists of indexes listTrain
and listTest
, which are partitions of the training set (the actual test set will be used only later). I would like to use the samples associated with listTrain
to estimate the parameters and the samples associated with listTest
to evaluate the error in a cross validation process (hold out set approach).
However, I am not be able to find the correct way to pass this to the sklearn GridSearchCV
.
The documentation says that I should create "An iterable yielding (train, test) splits as arrays of indices". However, I do not know how to create this.
grid_search = GridSearchCV(estimator = model, param_grid = param_grid,cv = custom_cv, n_jobs = -1, verbose = 0,scoring=errorType)
So, my question is how to create custom_cv
based on these indexes to be used in this method?
X
and y
are respectivelly the features matrix and y is the vector of labels.
Example: Supose that I only have one hyperparameter alpha that belongs to the set{1,2,3}. I would like to set alpha=1, estimate the parameters of the model (for instance the coefficients os a regression) using the samples associated with listTrain and evaluate the error using the samples associated with listTest. Then I repeat the process for alpha=2 and finally for alpha=3. Then I choose the alpha that minimizes the error.