0

I am working on a heavily imbalanced Multi-Class data for classification. I want to use the class_weight as given in many scikit-learn models. What is the best and proper way to do that inside a pipeline.

As I have seen in Documentation, scale_pos_weight is for binary classification only. This answer here with 15 upvotes by "Firas Omrane" gives some idea so I used

classes_weights = list(class_weight.compute_class_weight('balanced',
                                             classes = np.unique(y_train),
                                             y = y_train))

weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
    weights[i] = classes_weights[val-1]


XGBClassifier().fit(x_train, y_train, sample_weight=weights)

It works fine with fit but with the pipeline when used as:

('clf',XGBClassifier(class_weight = 'balanced', n_jobs = -1,objective = 'multi:softprob', sample_weight = classes_weights, )) # last step of the pipeline

it gives error as:

('clf',XGBClassifier(class_weight = 'balanced', n_jobs = -1,objective = 'multi:softprob', sample_weight = classes_weights, )) # last step of Pipeline

WARNING: /tmp/build/80754af9/xgboost-split_1619724447847/work/src/learner.cc:541: 
Parameters: { class_weight, sample_weight } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.
Deshwal
  • 3,436
  • 4
  • 35
  • 94

0 Answers0