2

After training an LDA model on gensim LDA model i converted the model to a with the gensim mallet via the malletmodel2ldamodel function provided with the wrapper. Before and after the conversion the topic word distributions are quite different. The mallet version returns very rare topic word distribution after conversion.

ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=13, id2word=dictionary)
model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldamallet)
model.save('ldamallet.gensim')

dictionary = gensim.corpora.Dictionary.load('dictionary.gensim')
corpus = pickle.load(open('corpus.pkl', 'rb'))
lda_mallet = gensim.models.wrappers.LdaMallet.load('ldamallet.gensim')
import pyLDAvis.gensim
lda_display = pyLDAvis.gensim.prepare(lda_mallet, corpus, dictionary, sort_topics=False)
pyLDAvis.display(lda_display)

Mallet Implementation after using <code>malletmodel2ldamodel</code>

Here is the output from gensim original implementation:

Here is the output from gensim original implementation

I can see there was a bug around this issue which has been fixed with the previous versions of gensim. I am using gensim=3.7.1

Shivam Agrawal
  • 2,053
  • 4
  • 26
  • 42

1 Answers1

1

Here is an optional function to use instead of malletmodel2ldamodel (reported to have bugs):

from gensim.models.ldamodel import LdaModel
import numpy

def ldaMalletConvertToldaGen(mallet_model):
    model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, alpha=mallet_model.alpha, eta=0, iterations=1000, gamma_threshold=0.001, dtype=numpy.float32)
    model_gensim.state.sstats[...] = mallet_model.wordtopics
    model_gensim.sync_state()
    return model_gensim

converted_model = ldaMalletConvertToldaGen(mallet_model)

I used it and it worked perfectly.

norpa
  • 127
  • 2
  • 16