I have a very unbalanced dataset that I need to build a model on top of that for a classification problem. The dataset has around 30000 samples which around 1000 samples are labelled as—1—, and the rest are 0. I build the model by the following lines:
X_train=training_set
y_train=target_value
my_classifier=GradientBoostingClassifier(loss='deviance',learning_rate=0.005)
my_model = my_classifier.fit(X_train, y_train)
Since, this is an unbalanced data, it is not correct to build the model simply like the above code, so I have tried to use class weights as follows:
class_weights = compute_class_weight('balanced',np.unique(y_train), y_train)
Now, I have no idea how I can use class_weights
(which basically includes 0.5 and 9.10 values) to train and build the model using GradientBoostingClassifier
.
Any idea? How I can handle this unbalanced data with weighted class or other techniques?