Why MALLET LDA need to keep-sequence?

Question

In the MALLET documentation, it requires --keep-sequence tag for Topic model training (Detail is at : http://mallet.cs.umass.edu/topics.php)

However, in my knowledge, regular LDA modeling use documents as bag of words, since including bigram will increase the feature space by a lot. I wonder why MALLET requires keep-sequence in LDA training, and how did MALLET actually use that sequential information?

Thank you for reading this post.

score 1 · Answer 1 · answered Oct 26 '15 at 07:34

It doesn't "need" to keep sequence.

That option is merely so that the words per topic when you do "--output-topic-keys" are in the same sequence as they appear in the notes.

It is also useful when you are looking to find phrases in topic models (http://www.mimno.org/articles/phrases/)

Why MALLET LDA need to keep-sequence?

1 Answers1