0

All I want to do is scrape some data about earthquakes from a website. In fact, I just want Python to be able to extract data from URL's. For some reason, even the simplest code which only opens a url and uses '.readlines()' is met with a wall of errors. It doesn't seem to understand the 'openurl' command, nor most anything else.

I don't know what to even try, because I can't parse the errors that it's giving me. I was hoping, before I had to do something drastic like re-download python or something, that someone would have an answer for me.

import urllib.request

def urltest():

url = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv"
f = urllib.request.urlopen(url)
allLines = f.readlines()
f.close()
line = allLines[0].decode()
print(line)

This is the code I've used to simply test it. The URL goes to a website which holds a .csv file, which python should easily acquire and read through.

If anyone wants, I can actually post the entire wall of errors that this code returns. There looks to be at least 6 different ones, but this is the final line that it spits back:

urllib.error.URLError: <urlopen error unknown url type: https>

3 Answers3

1

Looking through the urllib.requests module it loads a collection of handlers. we can see this code snippet in urllib.request.py

if hasattr(http.client, "HTTPSConnection"):
    default_classes.append(HTTPSHandler)
skip = set()
for klass in default_classes:
    for check in handlers:
        if isinstance(check, type):
            if issubclass(check, klass):
                skip.add(klass)
        elif isinstance(check, klass):
            skip.add(klass)
for klass in skip:
    default_classes.remove(klass)

for klass in default_classes:
    opener.add_handler(klass())

So the https handler class is only loaded if the http.client.py has the attribute HTTPSConnection. If we look in the http.client.py we can see the following code for setting this attribute.

try:
    import ssl
except ImportError:
    pass
else:
    class HTTPSConnection(HTTPConnection):
        "This class allows communication via SSL."

        default_port = HTTPS_PORT

So the HTTPSConnection class is only created if the ssl module can successfully be imported. If you system doesnt have the ssl module then http.client wont load the HTTPSConnection class which in turn will not add the attribute and as such urllib wont load a handler for https.

While the code you provided worked on my system. I added the following code before it to cause my system to not be able to locate the ssl module.

#load then remove the ssl module from the system
import sys
import ssl
del ssl
sys.modules['ssl']=None

import urllib.request


def urltest():

    url = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv"
    f = urllib.request.urlopen(url)
    allLines = f.readlines()
    f.close()
    line = allLines[0].decode()
    print(line)

urltest()

Doing this i get the same error you were getting

C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\python.exe C:/Users/cd00119621/PycharmProjects/ideas/stackoverflow.py
Traceback (most recent call last):
  File "C:/Users/cd00119621/PycharmProjects/ideas/stackoverflow.py", line 19, in <module>
    urltest()
  File "C:/Users/cd00119621/PycharmProjects/ideas/stackoverflow.py", line 13, in urltest
    f = urllib.request.urlopen(url)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error
    result = self._call_chain(*args)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 548, in _open
    'unknown_open', req)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\cd00119621\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 1387, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: https>

So i suspect you have installed python without ssl configured. You should be able to verify this easly by just trying to import ssl from the python command line import ssl if you get an error like

>>> import ssl
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'ssl'

Then that will be the cause of your issues. You would have to either reinstall python with ssl configured or somehow build the ssl module from source

Chris Doyle
  • 10,703
  • 2
  • 23
  • 42
  • You got it spot on. My python is installed in some subdirectory of Anaconda, which is what I used when I was teaching myself Python. My CS teacher, however, uses IDLE (on a mac, if you can believe it) and therefore I was forced to make almost a Frankenstein's monster version of python so that his files would be looking in the correct directory. Now I just have to figure out how to reinstall python without messing up this house of cards I have going. EDIT: The conspiracy against myself goes even deeper, as when I run it using a text editor which anaconda recognizes, it actually works just fine. – Samuel Johnson Oct 14 '19 at 22:58
0

It looks like the problem is a network(dns/proxy/firewall) issue. https://github.com/pbugnion/gmaps/issues/245

Shivangi Singh
  • 1,001
  • 2
  • 11
  • 20
-1

You can use Pandas:

import pandas as pd
data = pd.read_csv('http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv')
print (data)
Francesco Mantovani
  • 10,216
  • 13
  • 73
  • 113
  • This suffers the same problem since pandas calls urllib and the user cannot use urllib – Chris Doyle Oct 14 '19 at 20:29
  • This doesn't answer the question. In any case, pandas will eventually call urllib, so would likely fail anyway. – juanpa.arrivillaga Oct 14 '19 at 22:34
  • Though this didn't solve my problem, your comment actually made me go look up panda. Never seen it before. So thanks for that, at least, because now data analysis is actually possible on my machine! – Samuel Johnson Oct 15 '19 at 20:38