My question concerns the topic assignment in MALLET and the way it impacts the interpretation of the results.
The doc-topics-file states the proportion each topic has in a file. However, at the top of the list (58%) I encountered a file that does not use one of the words which constitute the topic X according to the topic-keys-file. In order to find an answer to this phenomenon I checked the output-state-file and learned that many words have been assigned to Topic X that do not appear in the topic-keys-list.
Why doesn’t mallet calculate the proportion of a topic in the doc-topics-file solely from the words that appear (as the most important for a topic) in the topics-keys-file?