Python requests Response 504

Question

I'm learning Python, and I'm trying to request access to a website using the command requests. I'm doing the following:

import requests
requests.get("http://www.charitystars.com")

However I get <Response [504]>, which should be an error because the soup command soup = BeautifulSoup(r.content) returns an empty line. I tried with other websites and I get <Response [200]>, and the soup works. So I wonder why the command doesn't work on the first website, and what Response 504 actually means.

@jwodder Thank you. Still, I don't get it. what does it mean? is it just temporarily down? Or there is a way to work this thing out? — tony, Feb 02 '17 at 23:46
`5xx` mostly means that server has some internal problem and you have to way till admins do something with this problem. — furas, Feb 02 '17 at 23:53
@furas Ok, so it is a problem on their end, not on mine. for example, I read somewhere that certain website require authorization in order to scrape the data. (I'm a beginner, sorry) — tony, Feb 03 '17 at 00:03
every page is differnt and may need different solution - some checks `user-agent` to correctly display data. You may need `authorization` if you use API - special urls to get pure data as JSON without all HTML. — furas, Feb 03 '17 at 00:10

furas · Answer 1 · 2017-02-03T00:14:25.153

9

This page doesn't like scripts/bots and it checks header user-agent.

It can also need this information to display correct page - different for desktop, tablet, smartfon.

import requests

headers = {'User-Agent': 'Mozilla/5.0'}

r = requests.get("http://www.charitystars.com/", headers=headers)

print(r.status_code)

BTW: requests as default uses "User-Agent": "python-requests/2.12.1"

You can use portal http://httpbin.org to see your requests.

import requests

r = requests.get("http://httpbin.org/get")

print(r.text)

edited Feb 03 '17 at 00:14

answered Feb 03 '17 at 00:01

furas

134,197
12
106
148

1

could you please explain me why it returns a 200 code if I specify the headers? Thank you! – tony Feb 03 '17 at 00:05
some servers check this header to recognize your browser and its capabilities - and then they can use different methods to display page. They use it also to recognize scripts/bots and refuse access. – furas Feb 03 '17 at 00:07
BTW: try `r = requests.get("http://httpbin.org/get")` and `print(r.text)` and you see that `requests` as default use `"User-Agent": "python-requests/2.12.1"` – furas Feb 03 '17 at 00:16

score 0 · Answer 2 · answered Jan 21 '20 at 14:35

I got error 504 for load balance timeout. The solution was to run the affected function on the background. My cloud provider offeres that, check for your case.

Also, your cloud provider may be denying access to that website. Check if they may have a white list in place.

Hope it helps.

Python requests Response 504

2 Answers2