7

I'm getting a "WindowsError: [Error 5] Access is denied" message when reading a website with urllib2.

from urllib2 import urlopen, Request
from bs4 import BeautifulSoup

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
req = Request('https://' + url, headers=hdr)
soup = BeautifulSoup( urlopen( req ).read() )

The full traceback is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 431, in open
    response = self._open(req, data)
  File "C:\Python27\lib\urllib2.py", line 449, in _open
    '_open', req)
  File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 1240, in https_open
    context=self._context)
  File "C:\Python27\lib\urllib2.py", line 1166, in do_open
    h = http_class(host, timeout=req.timeout, **http_conn_args)
  File "C:\Python27\lib\httplib.py", line 1258, in __init__
    context = ssl._create_default_https_context()
  File "C:\Python27\lib\ssl.py", line 440, in create_default_context
    context.load_default_certs(purpose)
  File "C:\Python27\lib\ssl.py", line 391, in load_default_certs
    self._load_windows_store_certs(storename, purpose)
  File "C:\Python27\lib\ssl.py", line 378, in _load_windows_store_certs
    for cert, encoding, trust in enum_certificates(storename):
WindowsError: [Error 5] Access is denied

I've tried running the script from a command prompt with admin privileges, as suggested here, but it does not fix the problem.

Any suggestions on how to resolve this error?

Boa
  • 2,609
  • 1
  • 23
  • 38
  • Are you running the source version of web2py, and are you able to make the same request outside of the web2py context (e.g., from a standard Python shell or Python script)? – Anthony Oct 21 '15 at 16:53
  • From the looks of it, Windows is denying access to the certificate store. – xrisk Oct 21 '15 at 16:55
  • Can you try installing `certifi`? – xrisk Oct 21 '15 at 17:00
  • @Anthony - 1. was running the web2py source version 2. you're right, the problem doesn't seem to be web2py specific, as I'm getting the same error message running it outside of web2py - I've edited the post accordingly – Boa Oct 21 '15 at 17:03
  • @RishavKundu - 1. yes, I thought so as well, but was under the impression that running the commands as admin should take care of it 2. do you mean the certifi python module, or something else? Where would I go from there? – Boa Oct 21 '15 at 17:06
  • @Boa can you try this in a python shell? `import _ssl` ? – xrisk Oct 21 '15 at 17:09
  • @RishavKundu - tried `import _ssl` from a shell - no problem there – Boa Oct 21 '15 at 17:11
  • @Boa `import _ssl; help(_ssl);` confirm that you are able to read the file listed there using python. – xrisk Oct 21 '15 at 17:16
  • @RishavKundu - I was able to read the _ssl module help file – Boa Oct 21 '15 at 17:18
  • make sure your file is not on 'read only' that should work –  Oct 26 '15 at 01:32
  • @Ben - to which file are you referring? – Boa Oct 26 '15 at 03:47
  • Please provide the exact python version, your OS version and your user's permission. Did you try running the script with elevated privileges? Is this a domain controlled computer? Do you have permissions to start/run/certmgr.msc. This is most likely a permission issue or there is something wrong with either your python version, your permissions or the certificate store. – tintin Oct 26 '15 at 15:20
  • @tintin - Windows 7, Python 2.7.10 - I have an admin account on the computer, and have tried running the script from a command prompt with admin privileges. I've also tried suspending the firewall and antivirus software on the system, but it didn't help. Yes, I am able to run certmgr.msc. – Boa Oct 26 '15 at 16:11
  • 1
    @Boa - can you confirm that this snippet triggers your issue? `from _ssl import enum_certificates; print enum_certificates("ROOT"); print enum_certificates("CA")`. If so, which one, or both? – tintin Oct 26 '15 at 17:51
  • @tintin - the first two lines execute without a problem, but the last command (`print enum_certificates("CA")`) triggers the `WindowsError: [Error 5] Access is denied` message. – Boa Oct 26 '15 at 18:46
  • 1
    so, there is something wrong with the windows `CA` -`certification authority` store (certmgr:`Intermediate Certification Authorities`) that makes `WINAPI::CertOpenSystemStore` fail for `CA`. I'd suggest to launch `certmgr`, navigate to `Intermediate Certification Authorities` and go through that list of cerficates or check with microsoft support as it seems like there is something messed up there. Anyway, can check whether `print enum_certificates(u"CA")` also raises an error. – tintin Oct 26 '15 at 19:24
  • btw. there is a workaround to not hit that access denied exception but it will disable certificate validation for intermediate certificates and cover the fact that something is wrong with your certstore. – tintin Oct 26 '15 at 19:25
  • @tintin - Thank you. I'll try to investigate whether there's something to resolve with the certificate store. – Boa Oct 26 '15 at 20:30
  • @tintin - I'm curious about the workaround as well (as long as it's a temporary one that only applies to the python application for the duration of the time that it's running, rather than changing something fundamental about the system's Windows installation). Thus far, my workaround has been to use selenium instead of urllib2 to grab https data. – Boa Oct 26 '15 at 20:32

2 Answers2

3

It looks like this is a windows certificate store inconsistency. httplib - which is internally called by urllib2 - recently changed from no server certificate validation to enforce server certificate validation by default. Therefore you'll encounter this problem in any python script that is based on urllib, httplib and running within your user profile.

That said, something seems to be very wrong with your windows certificate store. httplib fails for you while trying to enumerate certificates for the named certificate stores CA certification authority (shows up as Intermediate Certification Authorities in certmgr.msc) but succeeds for ROOT which is the normal trusted root certificate store (see comments to question). I'd therefore suggest to check all the certificates in certmgr:intermediate certificate authorities for recently added certificates and/or the windows log for general errors. What is going on in your case is that urllib2 internally calls httplib which then tries to set up a default ssl context with certificate validation enforced and as part of this it enumerates the trusted certificate anchors of your system by calling ssl.enum_certificates. This function is implemented in C as _ssl_enum_certificates_impl and internally calls WINAPIs CertOpenSystemStore and CertEnumCertificatesInStore. For the certificate store location CA it just failes in one of the two winapi calls with an access denied.

If you want to further debug this you can also try to manually invoke the WINAPI:CertOpenSystemStore with LPTCSTR::'CA' as an argument and try to debug it from this side, try other windows certstore management tools and/or call microsoft support for asistance.

There are also indications that others had similar problems while interfacing that api call, see google:access denied CertOpenSystemStore

If you just want to make it work without fixing the root cause you could just try to use the following workaround that temporarily patches the _windows_cert_stores to not include the broken CA certstore or to completely disable the trust-anchor loading logic. (all other ssl.SSLContext invocations will be patched in the current process)

Note that this effectively disables server certificate verification.

ssl.SSLContext._windows_cert_stores = ("ROOT",)         # patch windows_cert_stores default to only include "ROOT" as "CA" is broken for you.
#ssl.SSLContext.load_default_certs = lambda s,x:None    # alternative, fully NOP load_default_certs to do nothing instead.
ctx = ssl.create_default_context()                      # create new sslcontext, not veryfing any certificates, hostnames.
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE                         

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
req = Request('https://' + url, headers=hdr)
x = urlopen( req , context=ctx).read() 
ssl.SSLContext._windows_cert_stores = ("ROOT","CA")   # UNDO PATCH

I hope this information will help you resolve the issue. good luck.

tintin
  • 3,176
  • 31
  • 34
2

There are several potential problems using the Windows certificate store. (I've found for the case of running your code from a service account without a full user profile, this is near impossible). The reasons are somewhat complex, but not worth discussing further because there is an easier solution. Turning off SSL validation, as already suggested, is one workaround but probably not the best if you care about the validity of the certificates presented.

Just avoid this altogether by using a self-contained cert store. For Python this is the certifi package, which is kept up to date. This is easily accessed from the python requests package. Both should be readily accessible for most common python distributions

import requests
from bs4 import BeautifulSoup

url = "www.google.com"
hdr = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}

r = requests.get('https://' + url, headers=hdr, verify=True)
soup = BeautifulSoup(r.text)

Note that requests.get() will throw an exception on invalid addresses, unreachable sites, and failed certificate verification. So you want to be prepared to catch these. When a site was successfully contacted and the certificate was validated, but the page wasn't found (404 error for example), you won't get an exception. So, you should also check to see that r.status_code==200 after making the request. (30x redirects are handled, automatically so you won't see those as status codes unless you tell it to not follow them.) This checking is omitted from the example code for clarity.

Note also that you don't explicitly reference the certifi module here. requests will use it if installed. If not installed, requests will use a more limited built-in set of root CAs.

wojtow
  • 864
  • 5
  • 11
  • Thanks, wojtow. Indeed, I ultimately turned to the solution of using `certifi` (was asking about the method for bypassing validation mostly for reasons of intellectual curiousity). At some point, [I was interested in whether its possible to use certifi with urllib2](https://stackoverflow.com/questions/33745808/using-certifi-module-with-urllib2), but after a bit of research, `requests` seemed to be a more practicable alternative to `urllib`. – Boa Dec 04 '15 at 23:03
  • Ha, I just posted basically the same answer there, not realizing it was the same asker. While the answer is the same, the two questions are unique enough starting points to not be duplicates. Just trying to make sure others don't have to go through the same several days of head banging that I originally did (and apparently you did too). – wojtow Dec 04 '15 at 23:14
  • @wojwow - I was fortunate enough to be able to use selenium (before making the switch to a certifi variant), so I skipped the headbanging, but that's not an option that's necessarily available in all situations, and comes with a number of downsides (i.e. selenium is slower than the alternatives; requires annoying browser launches) that may cause a lot of frustration as well, so given the choice, I'd def recommend the [requests/curl/whatever]/certifi variant. Handy solution, so I'll give it the upvote, but can't mark it as correct, as it isn't using urllib2, as the original question was asking. – Boa Dec 04 '15 at 23:31
  • @Boa, can't win. Recently got downvoted for sticking with the original poster's choice of function calls and specific question about it, rather than suggesting an admittedly much better choice. – wojtow Dec 04 '15 at 23:58
  • Well, I did give you the upvote, which, if it isn't a total win, at least negates that downvote! – Boa Dec 05 '15 at 00:12