0

HeyGuys I am trying to use the nltk library for python and I have run into an error which I don't understand regarding the pos tagging function. Im running the code below in windows command window. The code is running up to the line where the text is pos tagged:
PosTokens = [pos_tag(e) for e in tokens]

from nltk import *

def main():

   text = "Hello my name is Bob. I am 12 years old."
   sentences = tokenize.sent_tokenize(text)
   print(sentences)
   tokens = [tokenize.word_tokenize(s) for s in sentences]
   print(tokens)
   PosTokens = [pos_tag(e) for e in tokens]

   return;

if __name__ == "__main__":
main()

I am getting the following as output

['Hello my name is Bob.', 'I am 12 years old.']
[['Hello', 'my', 'name', 'is', 'Bob', '.'], ['I', 'am', '12', 'years', 'old', ' .']]
Traceback (most recent call last):
File "test.py", line 15, in <module>
main()
File "test.py", line 10, in main
PosTokens = [pos_tag(e) for e in tokens]
File "test.py", line 10, in <listcomp>
PosTokens = [pos_tag(e) for e in tokens]
File "C:\Python34\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
File "C:\Python34\lib\site-packages\nltk\tag\perceptron.py", line 141, in  __init__
self.load(AP_MODEL_LOC)
File "C:\Python34\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
File "C:\Python34\lib\site-packages\nltk\data.py", line 801, in load
opened_resource = _open(resource_url)
File "C:\Python34\lib\site-packages\nltk\data.py", line 924, in _open
return urlopen(resource_url)
File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 455, in open
response = self._open(req, data)
File "C:\Python34\lib\urllib\request.py", line 478, in _open
'unknown_open', req)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 1303, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: c>

As you can see I can tokenize but I get an error when the pos tagging function is run. Does anyone know how I can fix this error? I am running python 3.4.0. Thank you for all your answers.

Jongware
  • 22,200
  • 8
  • 54
  • 100
BoliBoom
  • 101
  • 1
  • 6
  • It runs correctly here (python 2.7 and I ran `nltk.download()` before, and downloaded all corpora). Maybe pos_tag needs corpora that you don't have, it tries to get them on the fly and is failing for some reason. Did you run `nltk.download()`? – Paulo Almeida Apr 05 '16 at 17:17
  • Ok I will try to download the corpora. I have some corpora already installed and some not installed, should I install all of them? – BoliBoom Apr 05 '16 at 17:51
  • You could, but one thing I noticed now is that `` is probably the 'C' in `C:\\`. That may indicate a different problem. You can try downloading though, no harm done (other than the waiting time). – Paulo Almeida Apr 05 '16 at 17:58
  • See here: http://stackoverflow.com/questions/35827859/python-nltk-pos-tag-throws-urlerror – Paulo Almeida Apr 05 '16 at 17:59
  • Thank you! that exact link you gave me solved my problem! You are a god. – BoliBoom Apr 06 '16 at 06:47
  • 1
    See also: http://stackoverflow.com/questions/35836907/nltk-v3-2-unable-to-nltk-pos-tag/35964709#35964709 – alvas Apr 06 '16 at 08:48

0 Answers0