Tensorflow: scaled logits with cross entropy

Question

In Tensorflow, I have a classifier network and unbalanced training classes. For various reasons I cannot use resampling to compensate for the unbalanced data. Therefore I am forced to compensate for the misbalance by other means, specifically multiplying the logits by weights based on the number of examples in each class. I know this is not the preferred approach, but resampling is not an option. My training loss op is tf.nn.softmax_cross_entropy_with_logits (I might also try tf.nn.sparse_softmax_cross_entropy_with_logits). The Tensorflow docs includes the following in the description of these ops:

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

My question: Is the warning above referring only to scaling done by softmax, or does it mean any logit scaling of any type is forbidden? If the latter, then is my class-rebalancing logit scaling causing erroneous results?

Thanks,

Ron

Curious whether you've found a solution that works for you? I'm facing a similar challenge and am wondering how others have managed to deal with it? — VS_FF, Jun 20 '17 at 16:32
I tried multiplying the cross entropy for each example by the weight of the true class for the example, with questionable results. I have resorted to resampling the data. — Ron Cohen, Jun 24 '17 at 21:34

score 2 · Accepted Answer · answered Sep 30 '16 at 03:40

The warning just informs you that tf.nn.softmax_cross_entropy_with_logits will apply a softmax on the input logits, before computing cross-entropy. This warning seems really to avoid applying softmax twice, as the cross-entropy results would be very different.

Here is a comment in the relevant source code, about the function that implements tf.nn.softmax_cross_entropy_with_logits:

// NOTE(touts): This duplicates some of the computations in softmax_op
// because we need the intermediate (logits -max(logits)) values to
// avoid a log(exp()) in the computation of the loss.

As the warning states, this implementation is for improving performance, with the caveat that you should not put your own softmax layer as input (which is somewhat convenient, in practice).

If the forced softmax hinders your computation, perhaps another API could help: tf.nn.sigmoid_cross_entropy_with_logits or maybe tf.nn.weighted_cross_entropy_with_logits.

The implementation does not seem to indicate, though, that any scaling will impact the result. I guess a linear scaling function should be fine, as long as it preserves the original logits repartition. But whatever is applied on the input logits, tf.nn.softmax_cross_entropy_with_logits will apply a softmax before computing the cross-entropy.

Tensorflow: scaled logits with cross entropy

1 Answers1