22

I am using NLTK's nltk.tag.stanford, which needs to call the java executable.

I set JAVAHOME to C:\Program Files\Java\jdk1.6.0_25 where my jdk is installed, but when run the program I get the error

"NLTK was unable to find the java executable! Use the config_java() or set the JAVAHOME variable"

Then I spent 3 hours on debugging it and tried

config_java("C:/Program Files/Java/jdk1.6.0_25/")

config_java("C:/Program Files/Java/jdk1.6.0_25/bin/")
and those without the ending "/". 

However the nltk still cannot find it.

Anyone has idea about what's going wrong? Thanks a loooot!

Noufal Ibrahim
  • 71,383
  • 13
  • 135
  • 169
Thomas Chu
  • 221
  • 1
  • 2
  • 3

14 Answers14

54

If setting the JAVA_HOME environment doesn't help you, try this:

config_java() did not work for me. I add the following lines to my code and it worked:

import os
java_path = "C:/Program Files/Java/jdk1.7.0_11/bin/java.exe"
os.environ['JAVAHOME'] = java_path

I am running Windows 7 64-bit

Alan
  • 739
  • 6
  • 18
  • 3
    You can alternatively change the JAVA_HOME environmental variable and restart your IDE – Peeter Kokk Nov 04 '15 at 18:51
  • I don't remember the case anymore, but I believe I had the correct path. Setting the path to java.exe specifically breaks other scenarios so this is not a good solution. – Alan Nov 05 '15 at 21:44
  • 1
    I forgot to add that if you set the environmental variable, you have to remove the "java.exe" part, then it is a better solution (yours is a dirty fix). – Peeter Kokk Nov 06 '15 at 16:46
  • 1
    Exactly, I realize that it is a dirty fix, but the problem is it doesn't work otherwise. Have you tested your fix? – Alan Nov 07 '15 at 17:36
  • 1
    OK, they must have fixed something in NLTK. Your suggestion is the sane setup of any normal machine with Java. People looking for this question are probably having issues even with the "proper" setup so I will only update the answer to mention this. – Alan Dec 10 '15 at 08:18
  • It doesn't work for me!!! – NASRIN Nov 11 '21 at 08:12
10

I spent about seven hours working through this problem, and finally found a solution. You can write your java directory right into lines 69 and 72 of the internals.py file (build 2.0.4) as follows:

##########################################################################
# Java Via Command-Line
##########################################################################

_java_bin = 'C:\Program Files\Java\jdk1.7.0_25\\bin\java.exe'
_java_options = []
# [xx] add classpath option to config_java?
def config_java(bin='C:\Program Files\Java\jdk1.7.0_25\\bin\java.exe', options=None, verbose=True):

This resolves the problem for me. (I'm working in a 32 bit Windows environment)

duhaime
  • 25,611
  • 17
  • 169
  • 224
6

protos1210's tip worked for me, with a few minor changes. The full answer is:

import nltk
nltk.internals.config_java("C:/Program Files/Java/jdk1.6.0_30/bin/java.exe")

After I restarted IDLE, the following code worked.

import nltk
path_to_model = "C:/Program Files/stanford-postagger-2012-05-22/models/english-bidirectional-distsim.tagger"
path_to_jar = "C:/Program Files/stanford-postagger-2012-05-22/stanford-postagger.jar"
tagger = nltk.tag.stanford.POSTagger(path_to_model, path_to_jar)
tokens = nltk.tokenize.word_tokenize("I hope this works!")
print tagger.tag(tokens)

Output is: [('I', 'PRP'), ('hope', 'VBP'), ('this', 'DT'), ('works', 'VBZ'), ('!', '.')].

I never could get it to recognize my JAVAHOME environment variables.

Alexander Measure
  • 894
  • 1
  • 14
  • 18
2

I have tried all the above mentioned solutions and also the ones on Google Groups, but none worked. So after few more rounds of trial and modifications to above answers, the following piece of code worked for me :-

>>>  import os

>>>  os.environ['JAVAHOME'] = "C:/Program Files/Java/jdk1.8.0_31/bin" #insert approriate version of jdk

And then I tried NERTagger code :-

>>> from nltk.tag.stanford import NERTagger

>>> st = NERTagger('stanford-ner-2014-06-16/classifiers/english.all.3class.distsim.crf.ser.gz','stanford-ner-2014-06-16/stanford-ner.jar')

>>> st.tag('John has refused the offer from Facebook. He will work for Google'.split())

And the following was the output I received

'John', u'PERSON'), (u'has', u'O'), (u'refused', u'O'), (u'the', u'O'), (u'offer', u'O'), (u'from', u'O'), (u'Facebook', u'ORGANIZATION'), (u'.', u'O')]

Tested on Windows 7 64-bit

Andrew T.
  • 4,701
  • 8
  • 43
  • 62
K R Anushree
  • 21
  • 1
  • 3
2

I looked here and the docs seem to suggest that the argument ought to look like

config_java("C:/Program Files/Java/jdk1.6.0_25/bin/java")
Ernest Friedman-Hill
  • 80,601
  • 10
  • 150
  • 186
2

depending on your environment you might want to try reinstalling the nltk binary. I installed from binary and then later upgraded via easy_install and it incorrectly installed the osx version of nltk which caused exceptions when ntlk couldn't find my java binary.

Tyler
  • 21
  • 2
1

Another possibility when facing this error message while using the stanford package in NLTK is if you use StanfordTagger instead of PosTagger or NERTagger. According to Google Groups, there was a design to encourage users away from the general StanfordTagger class and towards one of the two specific taggers.

demongolem
  • 9,474
  • 36
  • 90
  • 105
1

Another distinct answer for this situation is you are using an IDE such as Eclipse. Even if you have set your JAVA_HOME environment variable and even if you explicitly call config_java and you get the [Found ... /bin/java.exe] message returned to you, you could still have to set the runtime environment for your IDE. The reason is that when you invoke the tagger, config_java is called again as part of the process and your original attempts at settings the path to the java binary executable can therefore be overwritten.

demongolem
  • 9,474
  • 36
  • 90
  • 105
1

I realize that this is an old question but here is the solution that worked for me (running on Windows 7-64 bit). Hopefully it will save someone some time.

I implemented the solution given here:

 "I have been able to get it working by commenting out two lines in the batch_tag function in     
 \nltk\tag\stanford.py

  The lines are line 59 and 85.

 config_java(options=self.java_options, verbose=False)
 and 
 config_java(options=default_options, verbose=False)
 respectively."

After commenting out the lines I set the path to the Java executable in the same manner mentioned in other answers:

 nltk.internals.config_java("path/to/javadk/bin/java.exe")

A kludgey but workable solution. Everything worked fine after that.

Renklauf
  • 971
  • 1
  • 12
  • 27
1

Hopefully this saves someone else some time when trying to fix this problem. I'm pretty new to programming, Python and the NLTK, and didn't realize when I was trying to implement @dduhaime's solution that there are two 'internals.py' files: one in the nltk folder (path=C:\nltk-2.0.4 on my computer) and one in my Python27 folder (path=C:\Python27\Lib\site-packages\nltk-2.0.4-py2.7.egg\nltk on my computer). You have to add the path to the java directory on lines 69 & 72 in the latter 'internals.py' file, or the NLTK will still not be able to find it.

My environment: Windows 7 64 bit, NLTK build 2.0.4

mariera
  • 31
  • 5
1

I too have been running into problems with this. It has been such a headache!

I got this to work on my machine (Win7_x64)

Replace 'jdk1.6.0_30' with your version of the jdk. Run this command:

config_java("C:/Program Files/Java/jdk1.6.0_30/bin/java.exe")
[Found C:/Program Files/Java/jdk1.6.0_30/bin/java.exe: C:/Program Files/Java/jdk1.6.0_30/bin/java.exe]

I do not know why it has been this difficult to get working. Hope this helps!

protoss1210
  • 162
  • 1
  • 6
0

I implemented a workaround for this because NLTK is misunderstanding the meaning of the JAVA_HOME variable:

import os
if os.environ.get("JAVA_HOME") is not None and "/bin" not in os.environ["JAVA_HOME"]:
    os.environ["JAVAHOME"] = os.path.normpath(os.path.join(os.environ["JAVA_HOME"], "bin"))

This basically takes the correct value you have in JAVA_HOME, and creates the NLTK-friendly version and stores it in JAVAHOME. NLTK will check both so this will find the binary. You need to do this before the tagger is created, obviously.

Kylotan
  • 18,290
  • 7
  • 46
  • 74
0

I came across the same issue and this is what worked for me which is really simple. When you are setting up JavaHome variable set the path to jdk folder in your machine like below:

C:\Program Files\Java\jdk\ - This did work

C:\Program Files\Java\jdk - This did not work

Les_Salantes
  • 317
  • 6
  • 20
-1

This answer is for ubuntu 14.04 .

commenting out two lines in the batch_tag function in \nltk\tag\stanford.py

The lines are line 59 and 85.

config_java(options=self.java_options, verbose=False) and config_java(options=default_options, verbose=False) respectively.

After commenting out the lines I set the path to the Java executable in the same manner mentioned in other answers: nltk.internals.config_java("path/to/javadk/bin/java")

Everything worked fine after that.