-3

I was trying to use GridSearch to iterate over different values of bandwidth for MeanShift algorithm and it shows this error; does any of you know how can I fix this? Thanks a lot!

# Using GridSearch for Algorithm Tuning
from sklearn.model_selection import GridSearchCV
meanshift=MeanShift()
C = range(48, 69) # For MeanShift bandwidth
param_grid = {"bandwidth": range(48, 69)}

mean_grid = GridSearchCV(estimator=meanshift, param_grid=param_grid, scoring=None)

mean_grid.fit(X)

And this is the error I get:

TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator MeanShift(bandwidth=None, bin_seeding=False, cluster_all=True, min_bin_freq=1,
     n_jobs=1, seeds=None) does not.
desertnaut
  • 57,590
  • 26
  • 140
  • 166
dezzaz
  • 1
  • 1
  • Do you have the actual labels of clusters? Since you dont pass any ground truth data (actual labels of clusters,`y`) in the `fit()`, how would you calculate the score? On what basis, GridSearchCV will decide that a specific `bandwidth` value is better than other values? – Vivek Kumar Aug 08 '18 at 11:56
  • No class in [`sklearn.clustering`](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster) supports `score()`, except `KMeans`. Do you want to use that `score()` function? – Vivek Kumar Aug 08 '18 at 11:57
  • @VivekKumar; thank you so much for your response. I acutally don't have any idea about the clusters I should be having in the end.. which means I can't calculate the score for MeanShift. However, I was wondering if there are other so that we set a score based on the clusters number we want? – dezzaz Aug 08 '18 at 13:46

2 Answers2

1

You can't use GridSearch with an unsupervised method well.

The concept of grid search is to choose those parameters that have the best score when predicting on held out data. But since most clustering algorithms cannot predict on unseen data, this does not work.

It's not that straightforward to choose "optimal" parameters in unsupervised learning. That is why there isn't an easy automation like gridsearch available.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

It's because MeanShift algoritm does not contain score function. In this case you have to specify scoring in GridSearchCV. Here is a complete list.

From the documentation of GridSearchCV:

Parameters:

estimator : estimator object.

This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score function, or scoring must be passed.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
RobJan
  • 1,351
  • 1
  • 14
  • 18