0

Using mallet I can get a specific number of topics and their words. How can I make sure topic words make a probability distribution (ie sum to one)?

For example if I run it as bellow, how can I use the outputs given by mallet to make sure probabilities of topic words for topic 0 adds up to 1?

mallet train-topics --input text.vectors --output-topic-keys topics.txt --output-doc-topics doc_comp.txt --topic-word-weights-file weights.txt --num-top-words 50 --word-topic-counts-file counts.txt --num-topics 3 --output-state topicstate.gz --alpha 1
samsamara
  • 4,630
  • 7
  • 36
  • 66
  • I'm assuming your weights.txt file contains the weighting given to each word in a topic? It's been a while since I used Mallet but you should be able to just open this file in something like Excel and sum the topic word weights? – SJB Oct 22 '15 at 00:29
  • yes but it doesn't sum to one. is it ok if I just L1 or L2 normalize the weights or is there any particular way of doing this? – samsamara Oct 22 '15 at 01:43
  • Could you post the weight of the top 10 terms in a topic? I think you may need to normalise. If I remember correctly when I used Mallet some words would have a weight greater than 1. – SJB Oct 22 '15 at 11:06
  • I _think_ the probabilities would wind up being p(word|topic) = `(count[topic, word] + alpha / num_word_types) / (sum(count[topic, w] for w in words) + alpha)`. – senderle Dec 20 '16 at 20:55
  • Possible duplicate of [how to get word-topic probability using mallet](http://stackoverflow.com/questions/19661094/how-to-get-word-topic-probability-using-mallet) – senderle Dec 20 '16 at 21:00

0 Answers0