From the apache Mahout website https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html I am able to see the procedure to fit an LDA model and output the computed topic in the form of P("word"|"topic number"). However, there is no information on how the trained model can be applied on a test data to predict the topic distribution. Or should we write our own program to use the output of conditional probablities to find the topics over a test data set?
Asked
Active
Viewed 1,407 times
0
-
There is an example in the [cluster-reuters.sh](http://svn.apache.org/repos/asf/mahout/trunk/examples/bin/cluster-reuters.sh) file of LDA topic clustering. You can find it in the examples directory. – Calavoow Sep 30 '12 at 21:54
-
@Calavoow, the example you refer to does the training part. I think Rkz wants to get the topic distribution for a new set of documents using the trained model. – Sam Nov 18 '13 at 18:35
1 Answers
0
Please have a look at publication by 2009 Wallach et. al. titled 'Evaluation Methods for Topic Models' here. Have a look at section 4, it mentions three methods to calculate P(z|w), one based on importance sampling and other two called 'Chib-style estimator' and 'left-to-right estimator'.
Mallet has implementation of left-to-right estimator method.

abhinavkulkarni
- 2,284
- 4
- 36
- 54