This is based on my comments in NLTK v3.2: Unable to nltk.pos_tag() and is based on the code supplied as an answer in Extract city names from text using python
I ran the code from Alvas' answer in NLTK v3.2: Unable to nltk.pos_tag() and the hack worked fine but when I try to run my nltk routine it still gives
raise URLError('unknown url type: %s' % type
… I also ran Sarim Hussain's suggestion
nltk.download('averaged_perceptron_tagger')
successfully but no luck. – GeorgeC 2 days ago
try upgrading your nltk, pip install -U nltk – alvas yesterday
just tried that. Still same error. On pip command I get C:\Python27\Scripts>pip install -U nltk Requirement already up-to-date: nltk in c:\python27\lib\site-packages
On running the pyhton in Idle or Pyscripter I get
Traceback (most recent call last):
File "E:\SBTF\ntlk_test.py", line 19, in <module>
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
self.load(AP_MODEL_LOC)
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 801, in load
opened_resource = _open(resource_url)
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 924, in _open
return urlopen(resource_url)
File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 454, in _open
'unknown_open', req)
File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 1265, in unknown_open
raise URLError('unknown url type: %s' % type)
URLError: <urlopen error unknown url type: c>
– GeorgeC 16 hours ago [above is different to what I reported earlier before restarting the computer]
Which OS are you using?
Windows 10.
What is your Python version?
2.7
How did you install python?
installed via ArcGIS 10.4 and also via OSGEO4W installer (with QGIS)
or conda? Where are you running Python? From the command prompt, terminal or in some IDE?
Idle and Pyscripter, also straight from QGIS and ArcGIS.
Are you running it through a server or a cloud? Are you running it on your laptop/computer?
Laptop i7 with 16GB RAM and about 500GB+ free.
Or in some school's lab where there might be a firewall?
Nope, my own network without firewall.
Where are you running the python script? Did you have any other file name call nltk.py in your directory? – alvas 16 hours ago
After upgrading to NLTK 3.2 did you use the AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE)) hack?
– alvas 16 hours ago
Yes. See code below for what I get.
Sorry for the multiple questions, your short comment isn't enough to >help us debug the problems, please answer each of the questions in
the previous 2 comments and we'll try to find a solution afterwards. Actually, it'll also be easier if yo ask another question and state all the answers to those questions in the comments, it looks like it's another problem. – alvas 16 hours agoHow did you install NLTK? Did you install through pip
No worries thanks for your time on this.
In the ArcGIS python module I get
>>> from nltk.tag import PerceptronTagger
>>> from nltk.data import find
>>> PICKLE = "averaged_perceptron_tagger.pickle"
>>> AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
>>> tagger = PerceptronTagger(load=False)
>>> tagger.load(AP_MODEL_LOC)
>>> pos_tag = tagger.tag
>>> pos_tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
>>> def extract_entity_names(t):
... entity_names = []
...
... if hasattr(t, 'label') and t.label:
... if t.label() == 'NE':
... entity_names.append(' '.join([child[0] for child in t]))
... else:
... for child in t:
... entity_names.extend(extract_entity_names(child))
...
... return entity_names
...
>>> with open('sample.txt', 'r') as f:
... for line in f:
... sentences = nltk.sent_tokenize(line)
... tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
... tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
... chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
...
... entities = []
... for tree in chunked_sentences:
... entities.extend(extract_entity_names(tree))
...
... print(entities)
...
Runtime error
Traceback (most recent call last):
File "<string>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'sample.txt'
>>> import os
>>> os.getcwd()
'C:\\Program Files (x86)\\ArcGIS\\Desktop10.4\\bin'
>>> os.chdir(r'E:\SBTF')
>>> os.getcwd()
'E:\\SBTF'
>>> with open('sample.txt', 'r') as f:
... for line in f:
... sentences = nltk.sent_tokenize(line)
... tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
... tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
... chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
...
... entities = []
... for tree in chunked_sentences:
... entities.extend(extract_entity_names(tree))
...
... print(entities)
...
Runtime error
Traceback (most recent call last):
File "<string>", line 3, in <module>
NameError: name 'nltk' is not defined
>>> import nltk
>>> with open('sample.txt', 'r') as f:
... for line in f:
... sentences = nltk.sent_tokenize(line)
... tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
... tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
... chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
...
... entities = []
... for tree in chunked_sentences:
... entities.extend(extract_entity_names(tree))
...
... print(entities)
...
Runtime error
Traceback (most recent call last):
File "<string>", line 5, in <module>
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
self.load(AP_MODEL_LOC)
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 801, in load
opened_resource = _open(resource_url)
File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 924, in _open
return urlopen(resource_url)
File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 454, in _open
'unknown_open', req)
File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 1265, in unknown_open
raise URLError('unknown url type: %s' % type)
URLError: <urlopen error unknown url type: c>