0

I have a large Markov chain and a sample, for which I want to calculate the likelihood. The problem is that some obervations or transitions in the sample don't occur in the Markov chain, which makes the total likelihood 0 (or the log-likelihood - infinity). It is not possible to use more data to construct the Markov chain. I was wondering if there's a way to still have a meaningfull likelihood.

I tried already to filter out these "unknown" observations in the sample and report them seperately. But the problem with that is that I want to compare the likelihood of the sample with the likelihood of the same sample, but after a transformation. The transformed sample has a different amount of "unknown" observations. So I don't think I can compare these two likelihoods, seeing as they have been calculated with a different amount of observations.

Is there a way to still calculate a meaningfull likelihood that can be compared? I was thinking about averaging the probabilities of the observations in the sample, but I can't find anything about that being correct.

Thanks in advance!

Thomas Pattyn
  • 73
  • 1
  • 3
  • I guess the term you want to google is `pseudocounts`. The idea is to add a very small number of additional observations equally to all your classes. (Typically just add 1). At first this may sound like a horrible heuristic, but there's a very nice mathematical motivation for that in bayesian statistics. – cel Aug 14 '15 at 11:23

1 Answers1

2

In simple words - the crucial thing in probabilistic models is probability distribution estimators. It seems that you use the most trivial possible - the empirical estimator, in the form of

p(event) = count(event) / [count(event) + count(not-event)]

Which for unseen event estimate 0 probability which leads to obvious problems. There are dozens estimators which do not have this problem, one of the simpliest ones is laplacian smoothing, where you assume that there is some mass of probability restricted for unseen events

p(event) = [count(event) + alpha] / [count(event) + count(not-event) + alpha * #event-types]

this way even the non occuring event whill have non-zero probability.

lejlot
  • 64,777
  • 8
  • 131
  • 164