4

Hi I was using gensim for topic modelling and was using Mallet and was executing this code I unzipped mallet in c drive as shown and also set the environment MALLET_HOME command. The code I was executing is

mallet_path = r'c:/mallet-2.0.8/bin/mallet'
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, 
num_topics=20, id2word=id2word)

this gives me a error like this

CalledProcessError                        Traceback (most recent call last)
<ipython-input-58-6e0dbb876ee6> in <module>()
----> 1 ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, 
corpus=corpus, 
   num_topics=20, id2word=id2word)

~\AppData\Local\Continuum\anaconda3\lib\site- 
packages\gensim\models\wrappers\ldamallet.py in __init__(self, mallet_path, 
corpus, num_topics, alpha, id2word, workers, prefix, optimize_interval, 
iterations, topic_threshold)
124         self.iterations = iterations
125         if corpus is not None:
--> 126             self.train(corpus)
127 
128     def finferencer(self):

~\AppData\Local\Continuum\anaconda3\lib\site- 
packages\gensim\models\wrappers\ldamallet.py in train(self, corpus)
265 
266         """
--> 267         self.convert_input(corpus, infer=False)
268         cmd = self.mallet_path + ' train-topics --input %s --num-topics 
%s  --alpha %s --optimize-interval %s '\
269             '--num-threads %s --output-state %s --output-doc-topics %s - 
-output-topic-keys %s '\

~\AppData\Local\Continuum\anaconda3\lib\site- 
packages\gensim\models\wrappers\ldamallet.py in convert_input(self, corpus, 
infer, serialize_corpus)
254             cmd = cmd % (self.fcorpustxt(), self.fcorpusmallet())
255         logger.info("converting temporary corpus to MALLET format with 
%s", cmd)
--> 256         check_output(args=cmd, shell=True)
257 
258     def train(self, corpus):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\utils.py in 
check_output(stdout, *popenargs, **kwargs)
1804             error = subprocess.CalledProcessError(retcode, cmd)
1805             error.output = output
-> 1806             raise error
1807         return output
1808     except KeyboardInterrupt:`

CalledProcessError: Command 'c:\mallet-2.0.8\bin\mallet import-file -- 
preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input 
C:\Users\apath009\AppData\Local\Temp\d186ea_corpus.txt --output 
C:\Users\apath009\AppData\Local\Temp\d186ea_corpus.mallet' returned non-zero 
exit status 1.

Please Help!!!

Anurag
  • 41
  • 4

2 Answers2

0

I had this error too, but now it is working. I'm not sure exactly what I did to make it start working, but I will detail everything I did.

First I followed everything in this link, the answer with 3 steps (pasted below): Error when implementing gensim.LdaMallet

  1. Make sure that MALLET_HOME is set

  2. Escape slashes when set mallet_path in Python

    mallet_path = 'c:\\mallet-2.0.8\\bin\\mallet'
    LDA_model = gensim.models.LdaMallet(mallet_path, ...
    
  3. Also, it might be useful to modify line 142 in Python\Lib\site-packages\gensim\models\ldamallet.py: change --token-regex '\S+' to --token-regex \"\S+\"

But I was still getting the error. Next, my computer is dual booting, so I booted up lubuntu, installed java, python3, gensim, and copied the mallet folder over to the lubuntu partition. I ran a test python file using the same code as windows, but with the new mallet directory /home/Desktop/mallet-2.0.8/bin/mallet from the lubuntu terminal. It worked. Then I booted back to Windows, and suddenly it worked on Windows as well.

Tetsuya Yamamoto
  • 24,297
  • 8
  • 39
  • 61
cobaltB12
  • 361
  • 1
  • 3
  • 9
0

Check that you have Java installed correctly and you have set up JAVA_HOME system variable and that you have added %JAVA_HOME%\bin to the PATH global system variable.

After doing that restart your computer for changes to take effect.

This did the trick for me.

aifrim
  • 555
  • 9
  • 20
beapea
  • 5
  • 4