4
import requests

is working properly for all my requests, like so:

url = 'http://www.stackoverflow.com'
response = requests.get(url)

bur the following url does not return any results:

url = 'http://www.billboard.com'
response = requests.get(url)

it stalls and fails silently, returning nothing.

how do I force requests into throwing me an exception response, so I can know if I'm being blacklisted or else?

8-Bit Borges
  • 9,643
  • 29
  • 101
  • 198

2 Answers2

13

Requests won't raise an exception for a bad HTTP response, but you could use raise_for_status to raise a HTTPError exception manually, example:

response = requests.get(url)

response.raise_for_status()

Another option is status_code, which holds the HTTP code.

response = requests.get(url)

if response.status_code != 200:
    print('HTTP', response.status_code)
else: 
    print(response.text)

If a site returns HTTP 200 for bad requests, but has an error message in the response body or has no body, you'll have to check the response content.

error_message = 'Nothing found'
response = requests.get(url)

if error_message in response.text or not response.text:
    print('Bad response')
else: 
    print(response.text)

If a site takes too long to respond you could set a maximum timeout for the request. If the site won't respond in that time a ReadTimeout exception will be raised.

try:
    response = requests.get(url, timeout=5)
except requests.exceptions.ReadTimeout:
    print('Request timed out')
else:
    print(response.text)
t.m.adam
  • 15,106
  • 3
  • 32
  • 52
  • thank you. but none of the options above return anything. it waits and waits, forever. I'm sure I've been blocked or blacklisted, because when I changed internet connection (ip), it ran smoothly. so if you edit your answer with a workaround such as the usage of a rotating agent, I'll accept that as an answer. – 8-Bit Borges Dec 12 '17 at 21:10
  • If the site is blocking your ip changing User-Agent won't help. Also, most tor exits are blacklisted, so i wouldn't trust tor. If you don't have a private proxy you can find some free proxies here: [free-proxy-list.net](https://free-proxy-list.net/) and in similar websites, but keep in mind that they are also very unreliable. – t.m.adam Dec 12 '17 at 22:01
  • curious product. it is named `'free', but at U$ 8 a month`. thanks for the link, anyway. will look into this solution. – 8-Bit Borges Dec 12 '17 at 23:14
  • wow. it actually returned response content (also with `response.content`) after the elapsed timeout. what kind of magic was that? – 8-Bit Borges Dec 12 '17 at 23:41
  • yes, it returned url, both with `reponse.text` and `response.content`, after a while. maybe restriction was due to the speed the page was being hit, I dont know. but it is working now. – 8-Bit Borges Dec 12 '17 at 23:50
  • `if str(response.status_code)[0] != 2: response.raise_for_status()` – alainsanguinetti May 12 '20 at 11:20
0

with:

import requesocks

#Initialize a new wrapped requests object
session = requesocks.session()

#Use Tor for both HTTP and HTTPS
session.proxies = {'http': 'socks5://localhost:9050', 
                   'https': 'socks5://localhost:9050'}

#fetch a page that shows your IP address
response = session.get('https://www.billboard.com')

print(response.text)

I was able to get:

    raise ConnectionError(e)
requesocks.exceptions.ConnectionError: HTTPSConnectionPool(host='www.billboard.com', port=None): Max retries exceeded with url: https://www.billboard.com/
8-Bit Borges
  • 9,643
  • 29
  • 101
  • 198