Latent Dirichlet Allocation using Gensim on more than one corpus

Question

I have two questions related to the usage of gensim for LDA.

1) How can I create a model using one corpus, save it and perhaps extend it later on another corpus by training the model on it ? Is it possible ?

2) Can LDA be used to classify an unseen document, or the model needs to be created again by including it in the corpus ? Is there an online way to do it and see the changes on the fly ?

I have a fairly basic understanding of LDA and have used it for Topic modeling on simple corpus using lda and gensim libraries. Please point out any conceptual inconsistencies in the question. Thanks !

to subsquestion 2: Yes you can classify new documents using topics generated from a training corpus (but I don't know how to achieve this task using gensim). — Sir Cornflakes, Jun 07 '15 at 11:23
@jknappen - I got it ! Have mentioned it in my answer. Thanks ! — Utsav T, Jun 07 '15 at 22:17

score 2 · Accepted Answer · answered Jun 05 '15 at 22:11

I found this to be helpful. Gensim does allow for an extra corpus to be added(updated) to an existing LDA model. This module allows both LDA model estimation from a training corpus and inference of topic distribution on new, unseen documents. This is described here -

https://radimrehurek.com/gensim/models/ldamodel.html

Additionally, the algorithm is streamed and can process corpora larger than the RAM. It also has a multicore implementation to speed up the process.

lda = LdaModel(corpus, num_topics=10)

lda.update(other_corpus)

This is how the model can be updated.

Latent Dirichlet Allocation using Gensim on more than one corpus

1 Answers1