0

I am trying to install the NLTK with IronPython in VS2012. But when I tried to import the NLTK.book, i got the following error. The NLTK.book is the accompanying data for the book.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\NLP\IronPython 2.7\lib\site-packages\nltk\book.py", line 21, in <module>
    text1 = Text(gutenberg.words('melville-moby_dick.txt'))
  File "D:\NLP\IronPython 2.7\lib\site-packages\nltk\corpus\util.py", line 68, in __getattr__
    self.__load()
  File "D:\NLP\IronPython 2.7\lib\site-packages\nltk\corpus\util.py", line 55, in _LazyCorpusLoader__load
    try: root = nltk.data.find('corpora/%s' % zip_name)
LookupError: 
**********************************************************************
  Resource 'corpora/gutenberg' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - 'C:\\Users\\John/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

I am wondering how to change the IronPath's search path.

smwikipedia
  • 61,609
  • 92
  • 309
  • 482

1 Answers1

2

You need to download the corpora/gutenberg resource that comes with NLTK. The download process is explained here: http://nltk.org/data.

Basically you need to do:

import nltk
nltk.download()

If you already have the NLTK resources installed somewhere, you need to change the NLTK_DATA environment variable to the location.

Viktor Vojnovski
  • 1,191
  • 1
  • 7
  • 19
  • The NLTK environment variable seems to do the magic. But one more issue. It seems the official NLTK install pkg failed to detect the existence of IronPython which is installed with VS2012. It did detect the CPython or the ActivePython. I have to copy the "nltk" and "nltk-2.0.4-py2.5.egg-info" folder from the "Lib\site-packages" folder to IronPython's. Not sure if it will cause any issue. – smwikipedia Dec 31 '13 at 01:52
  • 1
    There should be no problem. The nltk folder is the egg directory and the second one is the egg metadata folder. There is some more info on the egg formats here: http://pythonhosted.org/setuptools/formats.html – Viktor Vojnovski Dec 31 '13 at 09:56