7

I am running this:

# Hyperparameter tuning - Random Forest #

# Hyperparameters' grid
parameters = {'n_estimators': list(range(100, 250, 25)), 'criterion': ['gini', 'entropy'], 
              'max_depth': list(range(2, 11, 2)), 'max_features': [0.1, 0.2, 0.3, 0.4, 0.5], 
              'class_weight': [{0: 1, 1: i} for i in np.arange(1, 4, 0.2).tolist()], 'min_samples_split': list(range(2, 7))}


# Instantiate random forest
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(random_state=0)


# Execute grid search and retrieve the best classifier
from sklearn.model_selection import GridSearchCV
classifiers_grid = GridSearchCV(estimator=classifier, param_grid=parameters, scoring='balanced_accuracy',
                                   cv=5, refit=True, n_jobs=-1)
classifiers_grid.fit(X, y)

and I am receiving this warning:

.../anaconda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:536: 
FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
TypeError: '<' not supported between instances of 'str' and 'int'

Why is this and how can I fix it?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Outcast
  • 4,967
  • 5
  • 44
  • 99
  • I had similar error when performing feature selection with random forest. I changed the type of int variable to string (because I had one int and all others string variables) and error was fixed. – vasili111 Jun 06 '20 at 03:10

4 Answers4

4

I had similar issue of FitFailedWarning with different details, after many runs I found, the parameter value passing has the error, try

parameters = {'n_estimators': [100,125,150,175,200,225,250], 
              'criterion': ['gini', 'entropy'], 
              'max_depth': [2,4,6,8,10], 
              'max_features': [0.1, 0.2, 0.3, 0.4, 0.5], 
              'class_weight': [0.2,0.4,0.6,0.8,1.0],               
              'min_samples_split': [2,3,4,5,6,7]}

This will pass for sure, for me it happened in XGBClassifier, somehow the values datatype mixing up

One more is if the value exceeds the range, for example in XGBClassifier 'subsample' paramerters max value is 1.0, if it is set as 1.1, FitFailedWarning will occur

hanzgs
  • 1,498
  • 17
  • 44
1

For me this was giving same error but after removing none from max_dept it is fitting properly.

param_grid={'n_estimators':[100,200,300,400,500],
            'criterion':['gini', 'entropy'],
            'max_depth':['None',5,10,20,30,40,50,60,70],
            'min_samples_split':[5,10,20,25,30,40,50],
            'max_features':[ 'sqrt', 'log2'],
            'max_leaf_nodes':[5,10,20,25,30,40,50],
            'min_samples_leaf':[1,100,200,300,400,500]
            }

code which is running properly:

param_grid={'n_estimators':[100,200,300,400,500],
            'criterion':['gini', 'entropy'],
            'max_depth':[5,10,20,30,40,50,60,70],
            'min_samples_split':[5,10,20,25,30,40,50],
            'max_features':[ 'sqrt', 'log2'],
            'max_leaf_nodes':[5,10,20,25,30,40,50],
            'min_samples_leaf':[1,100,200,300,400,500]
            }

Surbhi Jain
  • 132
  • 1
  • 6
0

I too got same error and when I passed hyperparameters as in MachineLearningMastery, I got output without warning...

Try this way if anyone get similar issues...

# grid search logistic regression model on the sonar dataset
from pandas import read_csv
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# define model
model = LogisticRegression()
# define evaluation
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define search space
space = dict()
space['solver'] = ['newton-cg', 'lbfgs', 'liblinear']
space['penalty'] = ['none', 'l1', 'l2', 'elasticnet']
space['C'] = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100]
# define search
search = GridSearchCV(model, space, scoring='accuracy', n_jobs=-1, cv=cv)
# execute search
result = search.fit(X, y)
# summarize result
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)
DOT
  • 309
  • 2
  • 11
0

Make sure the y-variable is an int, not bool or str.

Change your last line of code to make the y series a 0 or 1, for example:

classifiers_grid.fit(X, list(map(int, y)))
eitrheim
  • 1
  • 1