I had a question regarding hyperparamter tuning and looking for best fitted model (looking for the best fitted model for a particular dataset). I got recommend that I should split my data into three sets, rather than two (training and testing only):
_Training
_Validation
_Testing
and use grid search (cross validation) on my training set, after grid search (cross validation), I may use another set "validation set" for testing the generalization power of my model (performance on unseen data), I may change some parameter after that. However, I do not know how to use validation set for testing the generalization power of my model.
My Code:
dt = DecisionTreeClassifier(random_state=12)
max_depth = [int(d) for d in np.linspace(1,20,20)]
max_features = ['log2', 'sqrt','auto']
criterion = ['gini', 'entropy']
min_samples_split = [2, 3, 50, 100]
min_samples_leaf = [1, 5, 8, 10]
grid_param_dt = dict(max_depth=max_depth, max_features=max_features, min_samples_split=min_samples_split, min_samples_leaf=min_samples_leaf, criterion=criterion)
gd_sr_dt = GridSearchCV(estimator=dt, param_grid=grid_param_dt, scoring='accuracy', cv=10)
gd_sr_dt.fit(x_train, y_train)
best_parameters_dt = gd_sr_dt.best_params_
print(best_parameters_dt)
and I get the hyperparameter tuning as below:
{'criterion': 'gini', 'max_depth': 9, 'max_features': 'log2', 'min_samples_leaf': 10, 'min_samples_split': 50}
How to use the validation set for testing the generalization power of model with these hyperparameters?