1

I have been trying to evaluate language models and I need to keep track of perplexity metric.

What I tried is: since perplexity is 2^-J where J is the cross entropy:

def perplexity(y_true, y_pred):
        oneoverlog2 = 1.442695
        return K.pow(2.0,K.mean(-K.log(y_pred)*oneoverlog2))

But this curiously goes to infinity during training within a few batches.

Is there some wrong with the implementation or any other way to implement perplexity?

Rafael
  • 651
  • 13
  • 30

3 Answers3

3

You're calculating the cross entropy formula that's undefined for y_pred=0 and also it's numerically instable.

I suggest you to use tf.sparse_cross_entropy_with_logits instead of writing your own formula. That function handles for you the numerical instability problem and the case where the input is zero.

If you really want to write the formula by your own, add a small amount to t_pred in order to have it different from zero or clip y_pred from something very small and 1.

nessuno
  • 26,493
  • 5
  • 83
  • 74
  • Thanks for the answer. So, finally, the perplexity function would be K.pow(2.0, K.mean(K.nn.softmax_cross_entropy_with_logits(y_true, y_pred, name=None))). Could you once see if this is correct? Many thanks :) – Rafael Jun 22 '17 at 12:12
  • I never used Keras, but if `K` it's the same of `tf` yes, it makes sense. Just be sure that `y_pred` is unscaled. Unscaled = it's the output of a set of linear neurons and not the output of the softmax function applied to those neurons – nessuno Jun 22 '17 at 13:23
  • thanks a lot for the reply. Yes, K = tf. y_pred should be unscaled, any reason why, I thought logits meant the output of softmax, so I have a softmax layer that produces the final prediction (in my case, a softmax over vocabulary words). Could you explain a bit here? Many thanks. – Rafael Jun 22 '17 at 15:29
  • softmax is an activation function for your output layer (that produces "probability" for each class). Remove it and then use the tensorflow methods to compute softmax + cross entropy. Yes, logits usually means "logistic regression output" while unscaled logits in this context means the output neurons without any activation function. The tensorflow methdo computes softmax for you in a better (and numerically stable) way, just like the logisti regression. However, if I solved your problem remember to mark my answer as accepted! – nessuno Jun 22 '17 at 16:04
2

I have been researching a bit on the topic and I think I can throw some light on this.

If you want to calculate perplexity using Keras and acording to your definition it would be something like this:

def ppl_2(y_true, y_pred):
    return K.pow(2.0, K.mean(K.categorical_crossentropy(y_true, y_pred)))

However the base should be e in stead of 2. Then the perplexity would be:

def ppl_e(y_true, y_pred):
    return K.exp(K.mean(K.categorical_crossentropy(y_true, y_pred)))
Guillem
  • 144
  • 4
  • 13
1

I've come up with two versions and attached their corresponding source, please feel free to check the links out.

def perplexity_raw(y_true, y_pred):
    """
    The perplexity metric. Why isn't this part of Keras yet?!
    https://stackoverflow.com/questions/41881308/how-to-calculate-perplexity-of-rnn-in-tensorflow
    https://github.com/keras-team/keras/issues/8267
    """
#     cross_entropy = K.sparse_categorical_crossentropy(y_true, y_pred)
    cross_entropy = K.cast(K.equal(K.max(y_true, axis=-1),
                          K.cast(K.argmax(y_pred, axis=-1), K.floatx())),
                  K.floatx())
    perplexity = K.exp(cross_entropy)
    return perplexity

def perplexity(y_true, y_pred):
    """
    The perplexity metric. Why isn't this part of Keras yet?!
    https://stackoverflow.com/questions/41881308/how-to-calculate-perplexity-of-rnn-in-tensorflow
    https://github.com/keras-team/keras/issues/8267
    """
    cross_entropy = K.sparse_categorical_crossentropy(y_true, y_pred)
    perplexity = K.exp(cross_entropy)
    return perplexity

Copied from my answer at Check perplexity of a Language Model

Ayush
  • 479
  • 2
  • 9
  • 24