2

I'm calculating the model-estimation of LDA with Mallet in Java and am looking for the term-topic-matrix.

Calculating the model and getting the topic-document-matrix goes well:

ParallelTopicModel model = ...;     //... estimating the model
int numTopics = model.getNumTopics();
int numDocs = model.getData().size();

// Getting the topic-probabilities
double[][] tmDist = new double[numDocs][];
for (int i = 0; i < numTopics; i++) {
        tmDist[i] = model.getTopicProbabilities(i);
}

And now I'm only able to get the top n words:

Object[][] topWords = model.getTopWords(5);
for(int i = 0; i < topWords.length; i++){
    for(int j = 0; j < topWords[i].length; j++){
        System.out.print(topWords[i][j] + " ");
    }
    System.out.println();
}

The only answers regarding that problem I only found questions/answers for this problem are regarding the command line version of Mallet.

Ben Baker
  • 83
  • 7

1 Answers1

-1

This piece of code will give you the topic assignment of all the words for a particular document.

for (int topic = 0; topic < numTopics; topic++) {
            Iterator<IDSorter> iterator = topicSortedWords.get(topic).iterator();
            out = new Formatter(new StringBuilder(), Locale.US);
            out.format("%d\t%.3f\t", topic, model.getTopicProbabilities(docID)[topic]);
            int rank = 0;
            while (iterator.hasNext() && rank < 5) {
                IDSorter idCountPair = iterator.next();
                out.format("%s (%.3f) ", dataAlphabet.lookupObject(idCountPair.getID()), idCountPair.getWeight());
                rank++;
            }
            System.out.println(out);
        }

        System.out.println("\n");
London guy
  • 27,522
  • 44
  • 121
  • 179
  • 1
    Thanks Abhishek, but I already knew this example (http://mallet.cs.umass.edu/topics-devel.php). I was looking for an array/matrix consisting of the alphabet x term - relation. – Ben Baker Jan 19 '15 at 18:22
  • Isn't that just arranging the output of the above piece of code in form of a matrix? Sorry if I did not understand your question properly. – London guy Jan 20 '15 at 11:33
  • Yes you are right, it's just a re-arrangement as a matrix, filling the empty cells (as not every topic consists of the full alphabet) and then normalize them over the relative counts. – Ben Baker Jan 22 '15 at 20:07