0

My problem is sentence classification, a two-class prediction problem: Does the sentence belong to class "1" or class "0"? In order to tackle the problem, I use a CNN.

However, I want to force the model to punish errors on class "1" more, because it is much more important for me that the model predicts class "1" reliably, i.e., false positives on class "0" are tolerable.

For this, I changed the cost function. Whenever there is an error and the error is on class "1", a value of 2 is returned, otherwise 1 is returned. If there is no error, 0 is returned. This should have the effect that the cost increases if the prediction is "0", but it should have been "1", compared to the inverse case.

This is my code (sorry for the mess).

def class_imbalanced_errors(self,y_pred,y):
    expr=lambda a,b: T.switch(T.neq(a,b),T.switch(T.neq(a,T.constant(1)),T.constant(1),T.constant(2)),T.constant(0))
    x,y = theano.scan(expr,sequences=[y,y_pred])
    return x

The function returns a vector of values 2,1, or 0, depending on the error types. The mean of this vector is then the final cost.

My question(s):

Is this the right approach to give different weights to the two classes? Does my implementation seem to be correct?

Alex
  • 778
  • 1
  • 8
  • 17
  • I don't know about Theano specifically, but if you're able to return a predicted probability (e.g., 0.51 or 0.99), then that's not the way to go about it because you're ignoring information about the errors. One way to go about giving more weight to a particular class is to undersample the other class - I have found this tends to work well in highly-imbalanced situations (which I inferred is what you are facing). – Tchotchke May 04 '16 at 12:53
  • I undersample already and you are right, this improves results a lot. But I search for a way to shift per-class performance. I want to have more true positives regarding class "1", even if that means that there will be more false negatives regarding class "0". I thought making the errors cost different for the classes would do the trick. – Alex May 04 '16 at 13:09
  • 1
    It can, but I think the way you are going about it is wrong. If you want to take that approach, I'd give more weight to the errors, so that you are taking into account the quality of the prediction. For example, if get one prediction for the positive class of 0.49 and another of 0.01, you want to know that the first was much closer to being correct than the latter - with your method you lose that information. To retain that information, you could multiply the errors by class. – Tchotchke May 04 '16 at 13:13
  • Seems reasonable, I have think about this a bit. Thank you! – Alex May 04 '16 at 13:16

0 Answers0