I am trying to nail down the error handling for the requests module in python in order to be notified as and when a URL is unavailable, i.e. HTTPError, ConnectionError, Timeout etc...
The issue that I am having is that I seem to be getting status responses of 200 even on FAKE URLs
I have trawled through S.O. & various other web sources, tried many differing ways of seemingly trying to achieve the same goal but have so far come up empty.
I have boiled the code down to as basic as it gets to simplify things.
import requests
urls = ['http://fake-website.com',
'http://another-fake-website.com',
'http://yet-another-fake-website.com',
'http://google.com']
for url in urls:
r = requests.get(url,timeout=1)
try:
r.raise_for_status()
except:
pass
if r.status_code != 200:
print ("Website Error: ", url, r)
else:
print ("Website Good: ", url, r)
I expected the first 3 URLs in the list to classed as 'Website Error:'
as they are URLs that I have just made up.
The final URL in the list is quite obviously real so should be the only one to be listed as 'Website Good:'
What is happening is the first URL produces a correct response to the code as it gives a response code of 503 but the next two URLs do not produce a status_code
at all according to https://httpstatus.io/
but only display ERROR
with Cannot find URI. another-fake-website.com another-fake-website.com:80
So I expected all but the last URL in the list to be shown as 'Website Error:'
OUTPUT
when running script in Raspberry Pi
Python 2.7.9 (default, Sep 26 2018, 05:58:52)
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
('Website Error: ', 'http://fake-website.com', <Response [503]>)
('Website Good: ', 'http://another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://yet-another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://google.com', <Response [200]>)
>>>
If I enter all 4 URLs in to https://httpstatus.io/
I get this result:
It shows a 503, a 200 & two URLs that do not have a status code but rather just display Error
UPDATE
so I thought that I would check this in Windows using PowerShell & followed this example: https://stackoverflow.com/a/52762602/5251044
This is the output below
c:\Testing>powershell -executionpolicy bypass -File .\AnyName.ps1
0 - http://fake-website.com
200 - http://another-fake-website.com
200 - http://yet-another-fake-website.com
200 - http://google.com
as you can see, I am no further forward.
UPDATE 2
having had further discussions with Fozoro HERE & trying various options with no fix in sight I thought that I would try this code using urllib2
instead of requests
Here is the changed code
from urllib2 import urlopen
import socket
urls = ['http://another-fake-website.com',
'http://fake-website.com',
'http://yet-another-fake-website.com',
'http://google.com',
'dskjhkjdhskjh.com',
'doioieowwros.com']
for url in urls:
try:
r = urlopen(url, timeout = 5)
r.getcode()
except:
pass
if r.getcode() != 200:
print ("Website Error: ", url, r.getcode())
else:
print ("Website Good: ", url, r.getcode())
Unfortunately the resulting output is still not correct but does differ slightly from the output of the previous code, see below:
Python 2.7.9 (default, Sep 26 2018, 05:58:52)
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
('Website Good: ', 'http://another-fake-website.com', 200)
('Website Good: ', 'http://fake-website.com', 200)
('Website Good: ', 'http://yet-another-fake-website.com', 200)
('Website Good: ', 'http://google.com', 200)
('Website Good: ', 'dskjhkjdhskjh.com', 200)
('Website Good: ', 'doioieowwros.com', 200)
>>>
This time it is showing all 200
responses, very peculiar.