0

I'm trying calculate a perplexity value for a language model and the calculation uses a lot of large powers. I have tried converting my calculation to log space using BigDecimal, but I'm not having any luck.

var sum=0.0
for(ngram<-testNGrams)
{
  var prob = Math.log(lm.prob(ngram.last, ngram.slice(0,ngram.size-1)))
  if (prob==0.0) sum = sum
  else sum = sum + prob
}
Math.pow(Math.log(Math.exp(sum)),-1.0/wordSize.toDouble)

How can I perform such a calculation in Scala without losing my large/small values to zero/Infinity? It seems like a trivial question but I haven't managed to do it.

In the above, you can assume that the method lm.prob issues the correct probabilities between 0 and 1, this has been amply tested.

user3297367
  • 79
  • 1
  • 1
  • 9

1 Answers1

1

Write everything in terms of log probabilities, not probabilities.

For instance, things like log(exp(sum)) just warm up your CPU while throwing away useful information. Avoid!

If you must convert to actual probabilities, do so at the very last step you can.

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • As you can see, I'm trying to use the log probs up until the last line, where I need to get the value for the perplexity. – user3297367 Nov 09 '14 at 10:38
  • @user3297367 - I'm not quite sure how that is supposed to be the perplexity. There's no `log(exp(sum))` term (that's just the identity function anyway). Are you trying to use `pow` to do the `-1/N` exponent? How about just `-sum/N`? But then you need to raise something to that power (and it's not the sum that you raise to that power typically?). It's also more typical to use log base 2 than base e. – Rex Kerr Nov 10 '14 at 20:46