5

I am trying to nail down the error handling for the requests module in python in order to be notified as and when a URL is unavailable, i.e. HTTPError, ConnectionError, Timeout etc...

The issue that I am having is that I seem to be getting status responses of 200 even on FAKE URLs

I have trawled through S.O. & various other web sources, tried many differing ways of seemingly trying to achieve the same goal but have so far come up empty.

I have boiled the code down to as basic as it gets to simplify things.

import requests

urls = ['http://fake-website.com', 
        'http://another-fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com']

for url in urls:
    r = requests.get(url,timeout=1)
    try:
        r.raise_for_status()
    except:
        pass
    if r.status_code != 200:
        print ("Website Error: ", url, r)
    else:
        print ("Website Good: ", url, r)

I expected the first 3 URLs in the list to classed as 'Website Error:' as they are URLs that I have just made up. The final URL in the list is quite obviously real so should be the only one to be listed as 'Website Good:'

What is happening is the first URL produces a correct response to the code as it gives a response code of 503 but the next two URLs do not produce a status_code at all according to https://httpstatus.io/ but only display ERROR with Cannot find URI. another-fake-website.com another-fake-website.com:80

So I expected all but the last URL in the list to be shown as 'Website Error:'

OUTPUT

when running script in Raspberry Pi

Python 2.7.9 (default, Sep 26 2018, 05:58:52) 
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
('Website Error: ', 'http://fake-website.com', <Response [503]>)
('Website Good: ', 'http://another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://yet-another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://google.com', <Response [200]>)
>>>

If I enter all 4 URLs in to https://httpstatus.io/ I get this result: HTTPSTATUS Screen Grab

It shows a 503, a 200 & two URLs that do not have a status code but rather just display Error

UPDATE

so I thought that I would check this in Windows using PowerShell & followed this example: https://stackoverflow.com/a/52762602/5251044

This is the output below

c:\Testing>powershell -executionpolicy bypass -File .\AnyName.ps1
0 - http://fake-website.com
200 - http://another-fake-website.com
200 - http://yet-another-fake-website.com
200 - http://google.com

as you can see, I am no further forward.

UPDATE 2

having had further discussions with Fozoro HERE & trying various options with no fix in sight I thought that I would try this code using urllib2 instead of requests

Here is the changed code

from urllib2 import urlopen
import socket

urls = ['http://another-fake-website.com',
        'http://fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com',
        'dskjhkjdhskjh.com',
        'doioieowwros.com']

for url in urls:

    try:
        r  = urlopen(url, timeout = 5)
        r.getcode()
    except:
        pass
    if r.getcode() != 200:
        print ("Website Error: ", url, r.getcode())
    else:
        print ("Website Good: ", url, r.getcode())

Unfortunately the resulting output is still not correct but does differ slightly from the output of the previous code, see below:

Python 2.7.9 (default, Sep 26 2018, 05:58:52) 
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
('Website Good: ', 'http://another-fake-website.com', 200)
('Website Good: ', 'http://fake-website.com', 200)
('Website Good: ', 'http://yet-another-fake-website.com', 200)
('Website Good: ', 'http://google.com', 200)
('Website Good: ', 'dskjhkjdhskjh.com', 200)
('Website Good: ', 'doioieowwros.com', 200)
>>> 

This time it is showing all 200 responses, very peculiar.

1cm69
  • 177
  • 2
  • 16

2 Answers2

2

You should put r = requests.get(url,timeout=1) inside of the try: block. So your code needs to look like this:

import requests

urls = ['http://fake-website.com', 
        'http://another-fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com']

for url in urls:
    try:
        r = requests.get(url,timeout=1)
        r.raise_for_status()
    except:
        pass
    if r.status_code != 200:
        print ("Website Error: ", url, r)
    else:
        print ("Website Good: ", url, r)

Output:

Website Error:  http://fake-website.com <Response [503]>
Website Error:  http://another-fake-website.com <Response [503]>
Website Error:  http://yet-another-fake-website.com <Response [503]>
Website Good:  http://google.com <Response [200]>

I hope this helps!

Nazim Kerimbekov
  • 4,712
  • 8
  • 34
  • 58
  • Thanks for replying but I have tried that & only get the same output as before. – 1cm69 Apr 07 '19 at 16:53
  • I've just added the output that I'm getting with this code, aren't you getting the same one? if so isn't it what you want? @1cm69 – Nazim Kerimbekov Apr 07 '19 at 17:47
  • Oddly, I am not getting that output. This is what has had me stumped all day. The output you have posted is exactly what I expect but I get... (see added OUTPUT section in my original post) – 1cm69 Apr 07 '19 at 18:41
  • the output that you've posted in your question looks the same to mine – Nazim Kerimbekov Apr 07 '19 at 18:43
  • No, yours show the first 3 URLs as 503 & last as 200. Mine shows first URL as 503 but all the rest as 200 – 1cm69 Apr 07 '19 at 18:47
  • oh yeah, my bad. well this is really strange what happens if you change the second item in your list and make it first, like this: `[ 'http://another-fake-website.com','http://fake-website.com', 'http://yet-another-fake-website.com', 'http://google.com']` – Nazim Kerimbekov Apr 07 '19 at 18:52
  • does it still give you the same output (503,200,200,200)? or is it giving you (200,503,200,200)? – Nazim Kerimbekov Apr 07 '19 at 18:53
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/191407/discussion-between-1cm69-and-fozoro). – 1cm69 Apr 07 '19 at 18:56
1

For me, the reason turned out to be a website served by my ISP about the URL being invalid - it's that website that returns a 200, not the fake one.

This can be verified by printing the content of the returned site with requests.get('http://fakesite').text

Błażej Czapp
  • 2,478
  • 2
  • 24
  • 18