1

I want to use Mallet as a part of an expert finding project. I'm almost new to Mallet but I know that it trains topics from a set of the documents. Let's say that I have 50 topics trained by Mallet. I want to calculate this probability: p(topic|q) or either p(q|topic)

q is the query. It's a word (such as algorithm, android and etc) which I'm desired to find the experts in the specified area.

As I read this post : how to get word-topic probability using mallet, One of the users said we can calculate the probability using --word-topic-counts-file option. Let's say that I have generated this file by Mallet. It has the following structure:

0 android 2:21
1 is 3:3
.
.
.

I know the semantic of this structure, But I don't know how can I calculate the probability of topic given query ( i.e. p(topic|q) or either p(q|topic) )

P.S: I use the word "either" because I'm not sure mallet calculates which of them

Any help would be appreciated

Community
  • 1
  • 1
inverted_index
  • 2,329
  • 21
  • 40

1 Answers1

0

Take this example line from GlieBrt's answer to the linked question

1 needham 19:2 17:1

Here p(topic|q) can be calculated as

p(19|needham) = 2/3 = 0.67

and

p(17|needham) = 1/3 = 0.33

With you own example, it is even simpler:

0 android 2:21

p(2|android) = 1.0

Community
  • 1
  • 1
Sir Cornflakes
  • 675
  • 13
  • 26