1

This is based on my comments in NLTK v3.2: Unable to nltk.pos_tag() and is based on the code supplied as an answer in Extract city names from text using python


I ran the code from Alvas' answer in NLTK v3.2: Unable to nltk.pos_tag() and the hack worked fine but when I try to run my nltk routine it still gives

raise URLError('unknown url type: %s' % type

… I also ran Sarim Hussain's suggestion

nltk.download('averaged_perceptron_tagger')

successfully but no luck. – GeorgeC 2 days ago

try upgrading your nltk, pip install -U nltk – alvas yesterday

just tried that. Still same error. On pip command I get C:\Python27\Scripts>pip install -U nltk Requirement already up-to-date: nltk in c:\python27\lib\site-packages

On running the pyhton in Idle or Pyscripter I get

Traceback (most recent call last):
  File "E:\SBTF\ntlk_test.py", line 19, in <module>
    tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
    tagger = PerceptronTagger()
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
    self.load(AP_MODEL_LOC)
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
    self.model.weights, self.tagdict, self.classes = load(loc)
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 801, in load
    opened_resource = _open(resource_url)
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 924, in _open
    return urlopen(resource_url)
  File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 431, in open
    response = self._open(req, data)
  File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 454, in _open
    'unknown_open', req)
  File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\ArcGIS10.4\lib\urllib2.py", line 1265, in unknown_open
    raise URLError('unknown url type: %s' % type)
URLError: <urlopen error unknown url type: c>

– GeorgeC 16 hours ago [above is different to what I reported earlier before restarting the computer]

Which OS are you using?

Windows 10.

What is your Python version?

2.7

How did you install python?

installed via ArcGIS 10.4 and also via OSGEO4W installer (with QGIS)

or conda? Where are you running Python? From the command prompt, terminal or in some IDE?

Idle and Pyscripter, also straight from QGIS and ArcGIS.

Are you running it through a server or a cloud? Are you running it on your laptop/computer?

Laptop i7 with 16GB RAM and about 500GB+ free.

Or in some school's lab where there might be a firewall?

Nope, my own network without firewall.

Where are you running the python script? Did you have any other file name call nltk.py in your directory? – alvas 16 hours ago

   After upgrading to NLTK 3.2 did you use the AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE)) hack?

– alvas 16 hours ago

Yes. See code below for what I get.

   Sorry for the multiple questions, your short comment isn't enough to >help us debug the problems, please answer each of the questions in

the previous 2 comments and we'll try to find a solution afterwards. Actually, it'll also be easier if yo ask another question and state all the answers to those questions in the comments, it looks like it's another problem. – alvas 16 hours agoHow did you install NLTK? Did you install through pip

No worries thanks for your time on this.

In the ArcGIS python module I get

>>> from nltk.tag import PerceptronTagger
>>> from nltk.data import find
>>> PICKLE = "averaged_perceptron_tagger.pickle"
>>> AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
>>> tagger = PerceptronTagger(load=False)
>>> tagger.load(AP_MODEL_LOC)
>>> pos_tag = tagger.tag
>>> pos_tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
>>> def extract_entity_names(t):
...     entity_names = []
... 
...     if hasattr(t, 'label') and t.label:
...         if t.label() == 'NE':
...             entity_names.append(' '.join([child[0] for child in t]))
...         else:
...             for child in t:
...                 entity_names.extend(extract_entity_names(child))
... 
...     return entity_names
...     
>>> with open('sample.txt', 'r') as f:
...     for line in f:
...         sentences = nltk.sent_tokenize(line)
...         tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
...         tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
...         chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
... 
...         entities = []
...         for tree in chunked_sentences:
...             entities.extend(extract_entity_names(tree))
... 
...         print(entities)
...         
Runtime error 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'sample.txt'
>>> import os
>>> os.getcwd()
'C:\\Program Files (x86)\\ArcGIS\\Desktop10.4\\bin'
>>> os.chdir(r'E:\SBTF')
>>> os.getcwd()
'E:\\SBTF'
>>> with open('sample.txt', 'r') as f:
...     for line in f:
...         sentences = nltk.sent_tokenize(line)
...         tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
...         tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
...         chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
... 
...         entities = []
...         for tree in chunked_sentences:
...             entities.extend(extract_entity_names(tree))
... 
...         print(entities)
...         
Runtime error 
Traceback (most recent call last):
  File "<string>", line 3, in <module>
NameError: name 'nltk' is not defined
>>> import nltk
>>> with open('sample.txt', 'r') as f:
...     for line in f:
...         sentences = nltk.sent_tokenize(line)
...         tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
...         tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
...         chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
... 
...         entities = []
...         for tree in chunked_sentences:
...             entities.extend(extract_entity_names(tree))
... 
...         print(entities)
...         
Runtime error 
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
    tagger = PerceptronTagger()
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
    self.load(AP_MODEL_LOC)
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
    self.model.weights, self.tagdict, self.classes = load(loc)
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 801, in load
    opened_resource = _open(resource_url)
  File "C:\Python27\ArcGIS10.4\lib\site-packages\nltk\data.py", line 924, in _open
    return urlopen(resource_url)
  File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 431, in open
    response = self._open(req, data)
  File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 454, in _open
    'unknown_open', req)
  File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\ArcGIS10.4\Lib\urllib2.py", line 1265, in unknown_open
    raise URLError('unknown url type: %s' % type)
URLError: <urlopen error unknown url type: c>

String.py is in the following enter image description here

Community
  • 1
  • 1
GeorgeC
  • 956
  • 5
  • 16
  • 40
  • Ah ha, now with the details, it seems like you have 2 different pythons installed on your computer. Can you go to powershell and do: `python -c "import sys;print sys.executable; import nltk; print nltk.__version__"` and tell us the output? Then add these lines to the top of script in ArcGIS `import sys; print sys.executable; import nltk; print nltk.__version__` and run it, what is the output for this? – alvas Apr 01 '16 at 08:22
  • Note the difference in the directories: `C:\Python27\ArcGIS10.4\lib\site-packages` vs `C:\python27\lib\site-packages`. – alvas Apr 01 '16 at 08:27
  • Also, try changing your own script to another name, e.g. `my_assignment.py` instead of `nltk.py`, see http://stackoverflow.com/questions/36326135/importerror-no-module-named-tag – alvas Apr 01 '16 at 08:28
  • the powershell command gives --- C:\Python27\python.exe >>3.2 the script with the name my_test.py and the code you wanted gives ---C:\Python27\ArcGIS10.4\python.exe >>3.2 I still get the same error as before for the rest of the code. – GeorgeC Apr 01 '16 at 10:53
  • Just to be sure, you didn't have the error on powershell, right? – alvas Apr 01 '16 at 11:53
  • Do you have a file name `string.py` in your directory? – alvas Apr 01 '16 at 19:01
  • 1
    Actually there a few copies of strign.py in multiple directories. See updated question. Yes there is no error in powershell. The error previously was because my Avast scanner was on at the time. I turned it off and then the install worked fine. It needs to be turned off to run gdal/ogr commands as well. – GeorgeC Apr 03 '16 at 03:13
  • Good that you got it to work! So it's a firewall problem with Avast. – alvas Apr 03 '16 at 06:18
  • Answer your own questions with some screen shots on turning on avast =) – alvas Apr 03 '16 at 06:18
  • I tried all these steps, until I found this question: https://stackoverflow.com/questions/42370497/nltk-unknown-url-error?noredirect=1&lq=1. Wish I has seen that first... – Ric Gaudet Nov 16 '17 at 19:39

0 Answers0