Evaluating the model as you train with scikit's LatentDirichletAllocation class

Question

I am experimenting with the LatentDirichletAllocation() class in scikit-learn, and the evaluate_every parameter has the following description.

How often to evaluate perplexity. Only used in fit method. set it to 0 or negative number to not evalute perplexity in training at all. Evaluating perplexity can help you check convergence in training process, but it will also increase total training time. Evaluating perplexity in every iteration might increase training time up to two-fold.

I set this parameter to 2 (default is 0) and saw an increased training time, but I can't seem to find the perplexity values anywhere. Are these results saved, or are they only used by the model to determine when to stop? I was hoping to use the perplexity values to measure the progress and learning curve of my model.

score 1 · Accepted Answer · answered Jan 12 '17 at 03:00

It's used in conjunction with the perp_tol parameter to assess convergence, and is not saved between iterations, per the source:

for i in xrange(max_iter):

    # ...

    # check perplexity
    if evaluate_every > 0 and (i + 1) % evaluate_every == 0:
        doc_topics_distr, _ = self._e_step(X, cal_sstats=False,
                                            random_init=False,
                                            parallel=parallel)
        bound = self.perplexity(X, doc_topics_distr,
                                sub_sampling=False)
        if self.verbose:
            print('iteration: %d, perplexity: %.4f'
                    % (i + 1, bound))

        if last_bound and abs(last_bound - bound) < self.perp_tol:
            break
        last_bound = bound
    self.n_iter_ += 1

Note though that you could easily adapt the existing source to do this by (1) adding the line self.saved_bounds = [] to the __init__ method (2) adding self.bounds.append(bound) to the above, like so:

if last_bound and abs(last_bound - bound) < self.perp_tol:
    break
last_bound = bound
self.bounds.append(bound)

Depending on where you saved your updated class, you'd also have to adapt the imports at top of file to reference full module paths in scikit-learn.

Evaluating the model as you train with scikit's LatentDirichletAllocation class

1 Answers1