6

I am trying to do the following:

vc = VotingClassifier(estimators=[('gbc',GradientBoostingClassifier()),
                       ('rf',RandomForestClassifier()),('svc',SVC(probability=True))],
                       voting='soft',n_jobs=-1)

params = {'weights':[[1,2,3],[2,1,3],[3,2,1]]}
grid_Search = GridSearchCV(param_grid = params, estimator=vc)
grid_Search.fit(X_new,y)
print(grid_Search.best_Score_)

In this, I want to tune the parameter weights. If I use GridSearchCV, it is taking a lot of time. Since it needs to fit the model for each iteration. Which is not required, I guess. Better would be use something like prefit used in SelectModelFrom function from sklearn.model_selection.

Is there any other option or I am misinterpreting something?

Abhinav Gupta
  • 435
  • 1
  • 4
  • 13
  • GridSearchCV will split the data into train and test according to supplied `cv` and then score them on the test data. Since you do not want to re-fit the estimators, which data would you want them to score on: train, test or all data? – Vivek Kumar Oct 17 '17 at 01:31
  • If I do the GridSearchCV, it will create models for each `weight_list` I have specified. But What I want to achieve is to use the same model for all the weights I am giving. I want to use `prefit`, but there is no option as `prefit` in GridSeachCV – Abhinav Gupta Oct 17 '17 at 06:36
  • @VivekKumar I have edited the problem code for a better explanation. Kindly, see. – Abhinav Gupta Oct 17 '17 at 06:42
  • 1
    No, you are not understanding what I am saying. Please look at the `cv` parameter of GridSearchCV. If you dont specify it, then a default 3-fold cv is used. Which means, 2 folds of data will be used to train the estimators and the third one is used for scoring. What I am asking is on what data do you want to get the score? – Vivek Kumar Oct 17 '17 at 06:58
  • I would advise you to write custom code for this. You can use [ParameterGrid](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ParameterGrid.html) to expand the parameters and then use them accordingly. – Vivek Kumar Oct 17 '17 at 07:18

2 Answers2

2

The following code (in my repo) would do this.

It contains a class VotingClassifierCV. It first makes cross-validated predictions for all classifiers. Then loops over all weights, choosing the best combination, and using pre-calculated predictions.

David Dale
  • 10,958
  • 44
  • 73
0

A compute friendlier way would be to first parameter tune each classifier separately on your training data. Then weight each classifier proportional to your target metric (say accuracy_score) from your validate data.

# parameter tune
models = {
   'rf': GridSearchCV(rf_params, RandomForestClassifier()).fit(X_trian, y_train),
   'svc': GridSearchCV(svc_params, SVC()).fit(X_train, y_train),
}

# relative weights
model_scores = {
   name: sklearn.metrics.accuracy_score(
      y_validate,
      model.predict(X_validate),
      normalized=True
   )
   for name, model in models.items()
}
total_score = sum(model_scores.values())

# combine the parts
combined_model = VotingClassifier(
  list(models.items()),
  weights=[
    model_scores[name] / total_score
    for name in models.keys()
  ]
).fit(X_learn, y_learn)

Finally, you may fit the combined model with your learning (train + validate) data & evaluate with your test data.

eliangius
  • 326
  • 3
  • 10