I am working on a heavily imbalanced Multi-Class data for classification. I want to use the class_weight
as given in many scikit-learn
models. What is the best and proper way to do that inside a pipeline.
As I have seen in Documentation, scale_pos_weight
is for binary classification only.
This answer here with 15 upvotes by "Firas Omrane" gives some idea so I used
classes_weights = list(class_weight.compute_class_weight('balanced',
classes = np.unique(y_train),
y = y_train))
weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
weights[i] = classes_weights[val-1]
XGBClassifier().fit(x_train, y_train, sample_weight=weights)
It works fine with fit
but with the pipeline when used as:
('clf',XGBClassifier(class_weight = 'balanced', n_jobs = -1,objective = 'multi:softprob', sample_weight = classes_weights, )) # last step of the pipeline
it gives error as:
('clf',XGBClassifier(class_weight = 'balanced', n_jobs = -1,objective = 'multi:softprob', sample_weight = classes_weights, )) # last step of Pipeline
WARNING: /tmp/build/80754af9/xgboost-split_1619724447847/work/src/learner.cc:541:
Parameters: { class_weight, sample_weight } might not be used.
This may not be accurate due to some parameters are only used in language bindings but
passed down to XGBoost core. Or some parameters are not used but slip through this
verification. Please open an issue if you find above cases.