1

The code is simple as follows

import nltk
nltk.data.path.append(r"E:\nltk_data")
nltk.pos_tag(["hello"])

And the error is

File "C:\Program Files (x86)\IronPython
2.7\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
    tagger = PerceptronTagger()   File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
    self.load(AP_MODEL_LOC)   File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
    self.model.weights, self.tagdict, self.classes = load(loc)   File "C:\Program Files (x86)\IronPython
2.7\lib\site-packages\nltk\data.py", line 800, in load
    # Load the resource.   File "C:\Program Files (x86)\IronPython 2.7\lib\site-packages\nltk\data.py", line 921, in _open
    # urllib might not use mode='rb', so handle this one ourselves:   File "C:\Program Files (x86)\IronPython
2.7\lib\site-packages\nltk\data.py", line 603, in find
    if zipfile is None:   File "C:\Program Files (x86)\IronPython 2.7\Lib\nturl2path.py", line 26, in url2pathname
    raise IOError, error IOError: Bad URL: /C|/E|/nltk_data/taggers/averaged_perceptron_tagger/averaged_perceptron_tagger.pickle

How come the url becomes /C|/E|/nltk_data/tagg... and why does it need to call url2pathname in the first place? I am already on Windows and the url that I supply is a Windows style url.

ozgur
  • 2,549
  • 4
  • 25
  • 40

2 Answers2

2

I had to dig into the code and finally found the problem. Nltk determines the operating system with if sys.platform.startswith('win'): (Extremely professional way to determine, by the way)

However, if you are using IronPython your platform is CLI.

I suspect this is causing lots of problems for IronPython users. So, next time any Python package acts like it's unix counterpart, just check modules for this code.

Edit: My fix for it is to replace the check code with sys.platform.startswith('win') or sys.platform.startswith('cli').

ozgur
  • 2,549
  • 4
  • 25
  • 40
  • I'm not sure whether `NLTK` supports IronPython, I know of cases where even with PyPy it breaks. On the installation page, it is known that `NLTK` supports Python (notably CPython) on Windows (32bit) and Mac/Unix, `NLTK requires Python versions 2.7 or 3.2+` http://www.nltk.org/install.html =) – alvas Apr 27 '16 at 15:59
  • Note that IronPython though having the Python inside the name isn't de facto Python (i.e. `CPython`) as referred to normally. – alvas Apr 27 '16 at 16:00
  • @alvas As of this moment, we migrated our Python application fully to IronPython and everything is tested. My conclusion is NLTK is completely compatible with IronPython. (of course there are other modules we haven't used or teste. But the application almost uses the majortiy of modules in NLTK) – ozgur Apr 27 '16 at 16:36
  • It would be good if there's a regression test on the `IronPython` on the main repo, if you would like to, feel free to raise and issue and do a pull request if you have rewritten tests on `IronPython` for NLTK. Thanks in advance! =) – alvas Apr 27 '16 at 22:22
  • One would expect the authors of a text analysis library would find a better way of checking the platform string than `startswith()` :) – sebrockm Jul 18 '18 at 22:18
-1

Your code is escaping a \n:

Replace \ with \\:

import nltk
nltk.data.path.append(r"E:\\nltk_data")
nltk.pos_tag(["hello"])

You can refer to this question: What exactly do "u" and "r" string flags do in Python, and what are raw string literals?

For more information about how raw strings literals works.

Community
  • 1
  • 1
Pierre Barre
  • 2,174
  • 1
  • 11
  • 23
  • I have tried [r"E:\\nltk_data"], ["E:\\nltk_data"], [r"E:\nltk_data"], [ur"E:\nltk_data"], [u"E:\\nltk_data"] and [u"E:\nltk_data"] . All of them gives me the same error. that's why I think it is not about the path. – ozgur Apr 27 '16 at 08:41