4

In my project, I use the Python library gensim for topic modeling/extraction of text. I try to load my trained LdaMallet model to classify new unseen texts.

The first part is loading the model.

import os

dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'mallet-2.0.8/bin/mallet')

# Download File: http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
os.environ['MALLET_HOME'] = # path to mallet

ldaMallet = gensim.models.wrappers.LdaMallet.load('lda_malletoutputCommentsAndMethods.model)
ldaModel = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldaMallet)

I am not sure about the last line which converts the ldaMallet to LdaModel. It was the only way to get some result.

Then the second part is preparing the new data and classify it.

from gensim.test.utils import common_dictionary
other_texts = [['new', 'document', 'to', 'classify', 'as', 'array']]
other_corpus = [common_dictionary.doc2bow(text) for text in other_texts]
vector = ldaModel[other_corpus[0]]

# sorts the result by probability and not by topic ID
print(sorted(vector, key=lambda x: x[1], reverse=True))

Then the result looks something like this:

[(16, 0.143), (17, 0.08), (9, 0.0653),...]

No matter which text I use in the other_texts array, this result isn't changing, but it should.

Freshchris
  • 1,211
  • 4
  • 17
  • 34
  • Print(ldaModel[other_corpus[0]) – Sara Apr 24 '19 at 00:06
  • Print(ldaModel[other_corpus[4]) and Print(ldaModel[other_corpus[:]) for me please – Sara Apr 24 '19 at 00:07
  • I have several posts applying a pre-trained LDA to a new data set, but never from a separate file location. I imagine the iterative steps are similar – Sara Apr 24 '19 at 00:08

0 Answers0