1

I'm new on topic modeling and I'm trying to use Mallet library but I have a question.

I'm using Simple parallel threaded implementation of LDA to find topics for some instances. My question is what is estimate function in ParallelTopicModel?

I have search in API but they have not description. Also I have read this tutorial.

Can someone explain what is this function?

EDIT

This is an example of my code:

 public void runModel(Sting [] str){    
    ParallelTopicModel model = new ParallelTopicModel(numTopics);
    ArrayList<Pipe> pipeList = new ArrayList<Pipe>();
    // Pipes: lowercase, tokenize, remove stopwords, map to features
    pipeList.add(new CharSequenceLowercase());
    pipeList.add(new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")));
    pipeList.add(new TokenSequence2FeatureSequence());
    InstanceList instances = new InstanceList(new SerialPipes(pipeList));
    instances.addThruPipe(new StringArrayIterator(str));

     model.addInstances(instances);
     model.setNumThreads(THREADS);
     model.setOptimizeInterval(optimizeation);
     model.setBurninPeriod(burninInterval);
     model.setNumIterations(numIterations);
     // model.estimate();
 }
Jimmysnn
  • 583
  • 4
  • 8
  • 30
  • your regex for tokens is a little odd.. First, `\p{L}` (lower case chars) is a subset of `\p{P}` (printable chars), so `[\p{L}]p{P}]` is the same as `\p{P}`. Second, if all your chars are printable, you get just one token per document (it starts from the first lower case letter and ends with the last one). – drevicko Nov 19 '14 at 00:43

1 Answers1

3

estimate() runs LDA, attempting to estimate the topic model given the data and settings you've already set up.

Have a look at the main() function of the ParrallelTopicModel source for inspiration about what's needed to estimate a model.

drevicko
  • 14,382
  • 15
  • 75
  • 97
  • I have create a model given the data and settings. When I call estimate function the result is some topics at 8 sec. When I don't call estimate function the result is some topics at 3 sec. What is the difference when I call estimate function? – Jimmysnn Nov 14 '14 at 13:41
  • how do you "create a model given data and settings"? What exactly did you do? – drevicko Nov 15 '14 at 05:17
  • 1
    The estimate function estimates the topic model. The topics you get before calling it are random initial allocations. – drevicko Nov 19 '14 at 00:40
  • 1
    Yes, it runs Gibbs Sampling method for inference algorithm and that requires an initial step to randomly assign words to different topics. The iterations later refine the assignments according to an optimization function. – London guy Jan 15 '15 at 13:31