4

I am using the following code:

import requests
url = 'http://www.transfermarkt.com/'
r = requests.get(url)
r.raise_for_status()

And I have the following output:

HTTPError: 404 Client Error: Not Found for url: http://www.transfermarkt.com/

But the link works normally from the browser. Why is this happening?

Mpizos Dimitris
  • 4,819
  • 12
  • 58
  • 100

1 Answers1

9

The site administrator has decided that the site should pretend to not exist to clients that do not share their User-Agent in their headers:

>>> import requests
>>> url = 'http://www.transfermarkt.com/'
>>> requests.get(url).raise_for_status()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 831, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found

Breaks as you've found out. Set a user agent:

>>> headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:39.0)'}
>>> requests.get(url, headers=headers).raise_for_status()
>>>

and you're good.

It seems like the site admin doesn't want you to do this, so perhaps you could ask for permission or ask if there's a preferred way to get the content, but not having a user-agent set was the technical reason.