7

Can someone help me how to write custom F1 score for multiclass classification in python???

Edit: I'm editing the question to give a better picture of what I want to do

This is my function for a custom eval f1 score metric for multiclass problem with 5 classes.

def evalerror(preds, dtrain):
    labels = dtrain.get_label()
    preds = preds.reshape(-1, 5)
    preds = preds.argmax(axis = 1)
    f_score = f1_score(preds, labels, average = 'weighted')
    return 'f1_score', f_score, True

Note: The reason I'm reshaping is the validation true value is of length 252705 whereas the preds is an array of length 1263525 which is 5 times the actual. The reason is LGB outputs the probab of each class for each prediction.

Below I'm converting the train and validation data to the format that LGB will accept.

dtrain = lgb.Dataset(train_X, label= train_Y, free_raw_data = False)
dvalid = lgb.Dataset(valid_X, label= valid_Y, free_raw_data = False, 
                     reference= dtrain)

Below is LGB model I'm fitting to the training data. As you can see I have passed the evalerror custom function to my model at feval and the also the validation data dvalid for which I want to see the f1 score while training. I'm training the model for 10 iterations.

evals_result = {}
num_round = 10
lgb_model = lgb.train(params, 
                      dtrain, 
                      num_round, 
                      valid_sets = dvalid, 
                      feval = evalerror,
                      evals_result = evals_result)

As the model is getting trained for 10 rounds the F1 score for each iteration on the validation set is displayed below which is not right as I'm getting around 0.18.

[1]     valid_0's multi_logloss: 1.46839        valid_0's f1_score: 0.183719
[2]     valid_0's multi_logloss: 1.35684        valid_0's f1_score: 0.183842
[3]     valid_0's multi_logloss: 1.26527        valid_0's f1_score: 0.183853
[4]     valid_0's multi_logloss: 1.18799        valid_0's f1_score: 0.183909
[5]     valid_0's multi_logloss: 1.12187        valid_0's f1_score: 0.187206
[6]     valid_0's multi_logloss: 1.06452        valid_0's f1_score: 0.187503
[7]     valid_0's multi_logloss: 1.01437        valid_0's f1_score: 0.187327
[8]     valid_0's multi_logloss: 0.97037        valid_0's f1_score: 0.187511
[9]     valid_0's multi_logloss: 0.931498       valid_0's f1_score: 0.186957
[10]    valid_0's multi_logloss: 0.896877       valid_0's f1_score: 0.18751

But once the model is trained for 10 iterations I run the below code to predict on the same validation set.

lgb_prediction = lgb_model.predict(valid_X)
lgb_prediction = lgb_prediction.argmax(axis = 1)
lgb_F1 = f1_score(lgb_prediction, valid_Y, average = 'weighted')
print("The Light GBM F1 is", lgb_F1)

The Light GBM F1 is 0.743250263548

Note: I have not reshaped here like I have done it in the custom function is because lgb_model.predict() outputs a numpy array of (252705, 5) Also note that I'm passing the valid_X and not dvalid because while predicting we will have to pass the original format not the sparse format like we pass in the lgb.train()

When I predicted on the same validation dataset, I'm getting a F1 score of 0.743250263548 which is good enough. So what I expect is the validation F1 score at the 10th iteration while training should be same as the one I predicted after training the model.

Can someone help me with the what I'm doing wrong. Thanks

Thanish
  • 79
  • 1
  • 3
  • Need more details. What is the shape of `preds`, whats `dtrain`, why are you reshaping the `preds`? How do you calculate the f1 score when you say you get 0.74 score? You are using the y_true and y_pred arguments wrong as said in answer. – Vivek Kumar Jul 03 '18 at 05:12
  • The shape of Y_true is 252705, Y_pred is 1263525(252705 * 5) as it's a 5 class problem the output of each data point is the prob of 5 classes. I have reshaped by preds.reshape(-1, 5) which outputs a numpy array of shape (252705, 5). I take the argmax which will give the max prob of each record.The final o/p of Y_pred will have a shape same as Y_true (252705). When I say I get 0.74 I execute this `pred = lgb_model.predict(valid_X) pred = pred.argmax(axis = 1)` First line predicts the prob and o/p shape is (252705, 5). Second one takes max probability. I pass truth and pred to the f1_score() – Thanish Jul 04 '18 at 03:55
  • Then why not just send that `pred` (which you calculate from `lgb_model.predict(valid_X)`), to the function. As you said that output of pred from `model.predict()` is already `(252705, 5)`, where does the new shape `1263525` comes from? – Vivek Kumar Jul 04 '18 at 06:06
  • model.predict() is after I build the model for few iterations and predict on the test set I get the shape (252705, 5). The function which I'm writing is to see the f1 score while the modeling is being built which outputs an array of length 1263525 for each iteration. So I had to reshape to (252705, 5) as I have explained above – Thanish Jul 04 '18 at 11:16
  • Then how you are you getting the outputs while the modeling is being built? – Vivek Kumar Jul 04 '18 at 12:35
  • I have edited the question with more details. I hope it helps and if someone could figure out what is wrong. – Thanish Jul 05 '18 at 04:11
  • 2
    There was an answer pointed in a github issue: https://github.com/Microsoft/LightGBM/issues/1483#issuecomment-401697767 – Mischa Lisovyi Jul 20 '18 at 14:29

2 Answers2

4

I had the same issue.

Lgb predictions are outputed in a flattened array.

By inspecting it, I figured out that it goes like this :

probability of sample a to class i is located at

num_classes*(a-1) + i position

As for your code , it should be like that:

    def evalerror(preds, dtrain):

        labels = dtrain.get_label()
        preds = preds.reshape(5, -1).T
        preds = preds.argmax(axis = 1)
        f_score = f1_score(labels , preds,  average = 'weighted')
        return 'f1_score', f_score, True
Igor
  • 41
  • 1
  • 1
2
sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average=’binary’, sample_weight=None)[source]

So according to this you should correct:

#f1_score(labels , preds)
def evalerror(preds, dtrain):
    labels = dtrain.get_label()
    preds = preds.reshape(-1, 5)
    preds = preds.argmax(axis = 1)
    f_score = f1_score(labels , preds,  average = 'weighted')
    return 'f1_score', f_score, True
H.Bukhari
  • 1,951
  • 3
  • 10
  • 15
  • I don't think swapping the pos argument would make a big difference. But just to reconfirm I did as you suggestd But this is the iteration result are almost the same `[2] valid_0's multi_logloss: 1.35684 valid_0's f1_score: 0.200796 [3] valid_0's multi_logloss: 1.26527 valid_0's f1_score: 0.200777` But the prediction result gave 0.7436534383 – Thanish Jul 03 '18 at 03:13