Instantiating and using StanfordTagger within NLTK

Question

I apologize for the newbie-nature of this question - I have been trying to figure out Python packaging and namespaces, but the finer points seem to elude me. To wit, I would like to use the Python wrapper to Stanford part-of-speech tagger. I had no trouble finding the documentation here, which provides a use sample:

st = StanfordTagger('bidirectional-distsim-wsj-0-18.tagger')
st.tag('What is the airspeed of an unladen swallow ?'.split())
    [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]

This looks great, but I can't seem to get the right namespaces to show up in my local Python + NLTK installation (I have the latest NLTK version, and have tried the below in Python 2.6.x as well as 2.7.x):

>>> import nltk
>>> from nltk import *
>>> from nltk.tag import stanford 
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name stanford

I also tried this import statement, with same result:

>>> from nltk.tag.stanford import StanfordTagger
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named stanford

Searching around here on SO, I found this question, where the poster seems to be experiencing the exact same problem, but is able to get past the namespace step with:

The problem is that my nltk lib doesnt contain the stanford module. So I copied the same into the appropriate folder and compiled the same.

Sounds like it is indeed the same issue, except I can't for the life of me find any documentation for how to add modules to NLTK. Everything I read on NLTK web site implies that the Stanford module should already be packaged into the base install. So, a question in two parts:

(Specific) Any suggestions for getting past this particular issue and starting to use StanfordTagger from Python? I know I can easily call the jar directly and then interpret the output in Python - that's all the Python wrapper does anyway - but I would like to get this to work out of principle, if nothing else.
(General) What is a good pythonic approach to investigating missing packaging issues or dependencies such as above?

score -1 · Answer 1 · answered Dec 18 '11 at 23:29

-1

Suggestions: a. Look on the nltk directory installed on your PC. I checked mine and stanford.py is not there (i.e. is missing in nltk/tag/ directory). You can find quickly where to look for running this:

import distutils.sysconfig
print distutils.sysconfig.get_python_lib()+'/nltk/tag/'

b. If it's not there, then copy the stanford.py file from the source you mentioned to the nltk/tag directory on your PC (which you get in step a).

I hope it works out.

answered Dec 18 '11 at 23:29

Max Li

5,069
3
23
35

Thank you for the suggestion of distutils.sysconfig. It told me what I already knew (and you confirmed) - that stanford POS tagger no longer appears to be a part of NLTK distribution, and the documentation appears out of date. I ultimately decided to use the Stanford tagger in batch mode instead of interfacing with it from Python, but it's good to know I wasn't missing anything obvious. – Inverseofverse Dec 20 '11 at 07:43

Instantiating and using StanfordTagger within NLTK

1 Answers1

Linked