0

Why do I get different keywords and LL/token every time I run topic models in Mallet? Is it normal?

Please help. Thank you.

1 Answers1

0

Yes, this is normal and expected. Mallet implements a randomized algorithm. Finding the exact optimal best topic model for a collection is computationally intractable, but it's much easier to find one of countless "pretty good" solutions.

As an intuition, imagine shaking a box of sand. The smaller particles will sift towards one side, and the larger particles towards the other. That's way easier than trying to sort them by hand. You won't get the exact order, but each time you'll get one of a large number of equally good approximate sortings.

If you want to have a stronger guarantee of local optimality, add --num-icm-iterations 100 to switch from sampling to choosing the single best allocation for each token, given all the others.

David Mimno
  • 1,836
  • 7
  • 7
  • Thanks. May I know what is the icm? What is the difference between --num-icm-iterations 100 and --num-iterations 100? –  Nov 14 '21 at 00:56
  • it stands for "iterated conditional modes", which probably doesn't help! The difference is as I described it: Given all other tokens' current topic assignment, you can get a distribution over topics for one token. Usually we pick a random topic according to that distribution, in ICM we pick the single most probable. – David Mimno Nov 14 '21 at 20:50