Unable to understand the HLDA Output in MALLET

Question

Below is a snippet of my code:

HierarchicalLDA hlda = new HierarchicalLDA();
hlda.initialize(instances, instances, 5, new Randoms());
hlda.estimate(1000);
hlda.printState(new PrintWriter(new File("Data.txt")));

I am unable to understand the meaning of both the console output and what is printed in the "Data.txt" file. I have already scoured the MALLET site but haven't found anything helpful. Any help or suggestion would be greatly appreciated. Thanks in advance!

score 2 · Answer 1 · answered Jun 30 '16 at 13:11

2

In hLDA each document samples a path through a tree of topics. Each token exists on one "level" of that path. The printState method gives you the ids of each tree node in the path for the document, followed by information about the word: the numeric ID for the word, the string for that id, and the level in the path.

    node = documentLeaves[doc];
    for (level = numLevels - 1; level >= 0; level--) {
        path.append(node.nodeID + " ");
        node = node.parent;
    }

    for (token = 0; token < seqLen; token++) {
        type = fs.getIndexAtPosition(token);
        level = docLevels[token];

        // The "" just tells java we're not trying to add a string and an int
        out.println(path + "" + type + " " + alphabet.lookupObject(type) + " " + level + " ");
    }

answered Jun 30 '16 at 13:11

David Mimno

1,836
7
7

Thanks a lot Professor! The output makes a lot of sense to me now. I was wondering how can I identify the hierarchy of the topics from this output though, and also the words which belong to each of those topics. – Anish Kanchan Jul 08 '16 at 00:30
1. So does each path represent a single document? Also, when I ran the hlda on 10 documents, in the output state file I could see only 5 paths (4 distinct ones and 1 got repeated in the end) with all the words at different levels. 2. In my case, did the 10 documents share the 5 paths. If so, how do I know which documents sampled which paths? – Anish Kanchan Jul 08 '16 at 20:02

Unable to understand the HLDA Output in MALLET

1 Answers1