Correct way to load LdaMallet model with gensim and classify unseen documents

Question

In my project, I use the Python library gensim for topic modeling/extraction of text. I try to load my trained LdaMallet model to classify new unseen texts.

The first part is loading the model.

import os

dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'mallet-2.0.8/bin/mallet')

# Download File: http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
os.environ['MALLET_HOME'] = # path to mallet

ldaMallet = gensim.models.wrappers.LdaMallet.load('lda_malletoutputCommentsAndMethods.model)
ldaModel = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldaMallet)

I am not sure about the last line which converts the ldaMallet to LdaModel. It was the only way to get some result.

Then the second part is preparing the new data and classify it.

from gensim.test.utils import common_dictionary
other_texts = [['new', 'document', 'to', 'classify', 'as', 'array']]
other_corpus = [common_dictionary.doc2bow(text) for text in other_texts]
vector = ldaModel[other_corpus[0]]

# sorts the result by probability and not by topic ID
print(sorted(vector, key=lambda x: x[1], reverse=True))

Then the result looks something like this:

[(16, 0.143), (17, 0.08), (9, 0.0653),...]

No matter which text I use in the other_texts array, this result isn't changing, but it should.

Print(ldaModel[other_corpus[4]) and Print(ldaModel[other_corpus[:]) for me please — Sara, Apr 24 '19 at 00:07
I have several posts applying a pre-trained LDA to a new data set, but never from a separate file location. I imagine the iterative steps are similar — Sara, Apr 24 '19 at 00:08

Correct way to load LdaMallet model with gensim and classify unseen documents

0 Answers0

Linked