0

I am experimenting with the Pachinko topic model in Mallet, and am having trouble getting it working. When it prints out the topics at each update, they are all the same. This occurs when I use both the default alpha and beta values, and when I use my own. I haven't been able to find that much written about Mallet's Pachinko model, and their documentation is pretty sparse. Here is my code with my parameters:

PAM4L model = new PAM4L(superTopics: 25, subTopics: 5);
    String output = new String("output");
    Randoms rand = new Randoms(4);
    model.estimate(InstanceList: instances, iterations: 500, 
        optimizeInterval: 5,showTopicsInterval: 50, 
        outputModelInterval: 50, outputFile: output, r: rand);

Thank you

Harry Baker
  • 93
  • 2
  • 9
  • Can you say more about the data? You might also try reversing the `superTopics` and `subTopics` values. The expectation is that there will be more sub-topics. – David Mimno Jul 20 '17 at 11:55
  • The data is about 70,000 biomedical texts that have been pre-processed. I used the same InstanceList of the data in the ParellelTopicModel and the topics made sense. – Harry Baker Jul 20 '17 at 13:20
  • When I run it with more subtopics than super topics in PAM4L, the individual subtopics make sense. However, each super topic is still composed of the same exact subtopics. This is with both the default alpha and beta values, as well as alphaSum = 1 and beta = 1/NumSubTopics. – Harry Baker Jul 20 '17 at 13:27
  • Wait, I'm lying. I think I just wasn't using enough iterations. After letting it run for about an hour the super-topics are starting to get different distributions. I do have a quick question about alpha and beta values, though (while I have you!) It is my understanding that in LDA, alphas and betas should both sum to 1. However, the default values for PAM4L do not. Does it not follow LDA's convention? – Harry Baker Jul 20 '17 at 14:05
  • They should be Dirichlet parameters, which must be positive but not necessarily sum to one. They're probably being updated at regular intervals, not at every iteration. – David Mimno Jul 20 '17 at 14:56

0 Answers0