2

Trying to get class_weight going . I know the rest of the code works, its just the class_weight that gives me the error:

    parameters_to_tune = ['min_samples_split':[2,4,6,10,15,25], 'min_samples_leaf':[1,2,4,10],'max_depth':[None,4,10,15],
                                             ^
SyntaxError: invalid syntax

Here is my code

clf1 = tree.DecisionTreeClassifier()
 parameters_to_tune = ['min_samples_split':[2,4,6,10,15,25], 'min_samples_leaf':[1,2,4,10],'max_depth':[None,4,10,15],
 'splitter' : ('best','random'),'max_features':[None,2,4,6,8,10,12,14],'class_weight':{1:10}]
clf=grid_search.GridSearchCV(clf1,parameters_to_tune)
clf.fit(features,labels)
print clf.best_params_

Does anyone spot the mistake I am making ?

hmmmbob
  • 1,167
  • 5
  • 19
  • 33
  • Can you give an example of what do your features and labels look like? – yangjie Aug 05 '15 at 15:24
  • features is basically an array of numbers(floats), where as labels, is ( dont know if you call that also an array or simply a vector) [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0..... – hmmmbob Aug 05 '15 at 15:35
  • `parameters_to_tune` should be a dict or list of dicts. Your initial syntax is right. You only need to change the 'class_weight' key-value pair in the dict. (Sorry I didn't see your updates just now but you'd better preserve your original post and append your updates otherwise people will not know the original question.) – yangjie Aug 06 '15 at 00:59
  • And your `class_weight` should be a list of dict, you made the mistake again... – yangjie Aug 06 '15 at 01:06

2 Answers2

6

I assume you want to grid search over different class_weight for the 'salary' class.

The value of class_weight should be a list:

'class_weight':[{'salary':1}, {'salary':2}, {'salary':4}, {'salary':6}, {'salary':10}]

And you can simplify it with list comprehension:

'class_weight':[{'salary': w} for w in [1, 2, 4, 6, 10]]

The first problem is that the parameter values in the dict parameters_to_tune should be a list, while you passed a dict. It can be fixed by passing a list of dicts as the value of class_weight instead and each dict contains a set of class_weight for DecisionTreeClassifier.

But the more serious problem is that class_weight are weights associated with classes, but in your case, 'salary' is the name of a feature. You can not assign weights to features. I misunderstood your intention at first but now I am confused about what you want.

The form of class_weight is {class_label: weight}, if you really mean to set class_weight in your case, class_label should be values like 0.0, 1.0 etc., and the syntax would be like:

'class_weight':[{0: w} for w in [1, 2, 4, 6, 10]]

If the weight for a class is large, it is more likely for the classifier to predict data to be in that class. One typical case to use class_weight is when the data is unbalanced.

Here is an example, although the classifier is SVM.

Update:

The full parameters_to_tune should be like:

parameters_to_tune = {'min_samples_split': [2, 4, 6, 10, 15, 25],
                      'min_samples_leaf': [1, 2, 4, 10],
                      'max_depth': [None, 4, 10, 15],
                      'splitter' : ('best', 'random'),
                      'max_features':[None, 2, 4, 6, 8, 10, 12, 14],
                      'class_weight':[{0: w} for w in [1, 2, 4, 6, 10]]}
yangjie
  • 6,619
  • 1
  • 33
  • 40
  • Thank you, and that looks very nice, but unfortunately when i tried both, i always the error : ValueError: Invalid parameter class_weight for estimator DecisionTreeClassifier – hmmmbob Aug 05 '15 at 11:02
  • `DecisionTreeClassifier` does not have `class_weight` before scikit-learn 0.16. And it is likely that you didn't upgrade from 0.15 to 0.16 properly considering the new error. (See http://stackoverflow.com/questions/29596237/import-check-arrays-from-sklearn) – yangjie Aug 05 '15 at 11:50
  • Thanks alot . it has to be something like that. I used the shell now to install it with "conda install...." Unfortunately it is getting more and more cryptic http://pastebin.com/SuQVbuBu :( wanna give up – hmmmbob Aug 05 '15 at 11:57
  • I just did that and get the same error as previously :( – hmmmbob Aug 05 '15 at 14:13
  • Thanks for your update, now i am thorughly confused :( I thought class_weight could give me like factors how to weigh the different features in my classifer, otherwise i dont understand the purpose of it at all :( Could you maybe try to give me an example where weight is appropiate if it is not for features ? – hmmmbob Aug 05 '15 at 16:30
  • updated my first according to your info, syntax error ? – hmmmbob Aug 05 '15 at 20:14
0

Link below is about the usage of different class_weight values. Just Ctrl+F "class_weight" to the relevant section. It's using GridSearchCV to find best class_weight for different optimization goals.

Optimizing a classifier using different evaluation metrics

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Layla
  • 413
  • 4
  • 7