I am trying to load a saved gensim lda mallet:
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=n_topics,id2word=id2word)
ldamallet.save('ldamallet')
When testing this for a new query (with the original corpus and dictionary), everything seems fine for the first load.
ques_vec = [dictionary.doc2bow(words) for words in data_words_list]
for i, row in enumerate(lda[ques_vec]):
row = sorted(row, key=lambda x: (x[1]), reverse=True)
On executing the same code afterward, it is this error that pops up:
java.io.FileNotFoundException: /tmp/9f371_corpus.mallet (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at cc.mallet.types.InstanceList.load(InstanceList.java:787) at cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:131) Exception in thread "main" java.lang.IllegalArgumentException: Couldn't read InstanceList from file /tmp/9f371_corpus.mallet at cc.mallet.types.InstanceList.load(InstanceList.java:794) at cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:131) Traceback (most recent call last): File "topic_modeling1.py", line 406, in topic = get_label(text, id2word, first, ldamallet) File "topic_modeling1.py", line 237, in get_label for i, row in enumerate(lda[ques_vec]): File "/home/user/sjha/anaconda3/envs/conda_env/lib/python3.6/site-packages/gensim/models/wrappers/ldamallet.py", line 308, in getitem self.convert_input(bow, infer=True) File "/home/user/sjha/anaconda3/envs/conda_env/lib/python3.6/site-packages/gensim/models/wrappers/ldamallet.py", line 256, in convert_input check_output(args=cmd, shell=True) File "/home/user/sjha/anaconda3/envs/conda_env/lib/python3.6/site-packages/gensim/utils.py", line 1806, in check_output raise error subprocess.CalledProcessError: Command '/home/user/sjha/projects/topic_modeling/mallet-2.0.8/bin/mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input /tmp/9f371_corpus.txt --output /tmp/9f371_corpus.mallet.infer --use-pipe-from /tmp/9f371_corpus.mallet' returned non-zero exit status 1.
Contents of my /tmp/
directory:
/tmp/9f371_corpus.txt /tmp/9f371_doctopics.txt /tmp/9f371_doctopics.txt.infer /tmp/9f371_inferencer.mallet /tmp/9f371_state.mallet.gz /tmp/9f371_topickeys.txt
Also, it seems like the files /tmp/9f371_doctopics.txt.infer
and /tmp/9f371_corpus.txt
get modified every time I load the model. What could be the possible error source? Or is it some kind of bug in gensim's mallet wrapper?