This is a tutorial that im doing for Scikitlearn machine learning. I was using 3 different classifiers in Scikitlearn which is decision tree, logistic regression and KNearestNeighbors. The individual classifiers worked fine and I combined them together into a ensemble learning algo using MajorityVoting which is represented as mv_clf in the codes.
These are the results of the classifiers
10-fold cross validation:
ROC AUC: 0.92 (+/- 0.15) [Logistic Regression]
ROC AUC: 0.87 (+/- 0.18) [Decision tree]
ROC AUC: 0.85 (+/- 0.13) [KNN]
Accuracy: 0.92 (+/- 0.15) [Logistic Regression]
Accuracy: 0.87 (+/- 0.18) [Decision tree]
Accuracy: 0.85 (+/- 0.13) [KNN]
Accuracy: 0.98 (+/- 0.05) [Majority voting]
However, when I tried GridSearchCV to tune the parameters as a tutorial, there was an error in the grid.fit() function. I searched the documentation of GridSearchCV but i failed to understand why it fails to fit, because the output of the GridSeachCV seems fine.
params = {'pipeline-1__clf__C': [0.001, 0.1, 100.0], 'decisiontreeclassifier__max_depth': [
1, 2], 'pipeline-2__n_neighbors': [1, 2]}
grid = GridSearchCV(estimator=mv_clf, param_grid=params,
scoring='roc_auc', cv=10)
print(grid)
grid.fit(X_train, y_train)
print(grid) function output
GridSearchCV(cv=10,
estimator=VotingClassifier(estimators=[('lr',
Pipeline(steps=[['sc',
StandardScaler()],
['clf',
LogisticRegression(C=0.001,
random_state=1)]])),
('dt',
DecisionTreeClassifier(criterion='entropy',
max_depth=1,
random_state=0)),
('KNN',
Pipeline(steps=[['sc',
StandardScaler()],
['clf',
KNeighborsClassifier(n_neighbors=1)]]))],
voting='soft'),
param_grid={'decisiontreeclassifier__max_depth': [1, 2],
'pipeline-1__clf__C': [0.001, 0.1, 100.0],
'pipeline-2__n_neighbors': [1, 2]},
scoring='roc_auc')
The print grid function has a normal output but when I tried to grid.fit(), there is an error and I am not sure why.
These are the errors that was shown after grid.fit() is called
Traceback (most recent call last):
File "/Users/cheokjiaheng/Documents/Coding Projects/Tutorials/Python Machine Learning Book/Combining Diff Models/MajorityVoting.py", line 115, in <module>
grid.fit(X_train, y_train)
...
...
...
File "/Users/cheokjiaheng/miniforge3/envs/tensorflowenv/lib/python3.8/site-packages/sklearn/base.py", line 230, in set_params
raise ValueError('Invalid parameter %s for estimator %s. '
ValueError: Invalid parameter decisiontreeclassifier for estimator VotingClassifier(estimators=[('lr',
Pipeline(steps=[['sc', StandardScaler()],
['clf',
LogisticRegression(C=0.001,
random_state=1)]])),
('dt',
DecisionTreeClassifier(criterion='entropy',
max_depth=1,
random_state=0)),
('KNN',
Pipeline(steps=[['sc', StandardScaler()],
['clf',
KNeighborsClassifier(n_neighbors=1)]]))],
voting='soft'). Check the list of available parameters with `estimator.get_params().keys()`.