4

I'm using the Requests library in Python. In the browser, my URL loads okay. In Python, it throws a 403.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /admin/license.php on this server.</p>
<p>Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.</p>
</body></html>

This is my own site, and I don't have any robot protection on it that I know of. I made the PHP file that I'm loading and it's just a simple database query. In the root of the site, I have a WordPress site with default settings. However, I'm not sure if that's relevant.

My code:

import requests
url = "myprivateurl.com"
r = requests.get(url)
print r.text

Does anyone have any guesses why it's throwing a 403 by Python and not by browser?

Thanks so much.

User
  • 23,729
  • 38
  • 124
  • 207
  • Can you post the corresponding python code? – Rod Xavier Apr 21 '14 at 03:27
  • Did you access the url by a browser? Did it give the same `403`? – emesday Apr 21 '14 at 03:27
  • By Firefox, it does not give a 403, no errors. Added the code, even though I don't think it's much use. – User Apr 21 '14 at 03:30
  • Did you perform any authentication when you viewed it in the browser? – Rod Xavier Apr 21 '14 at 03:34
  • No authentication. Also, I tried on more URLs on my web host, they're all giving 403s with the same request code. Even though, almost all are indexed by Google robots. I'm going to contact my web host, even though I don't think it has anything to do with them. – User Apr 21 '14 at 03:35

3 Answers3

3

After contacting my web host, and having the ticket upgraded to level 2 support, they disabled mod_security and it works fine now. Not sure if this is a bad thing, but that fixed it.

User
  • 23,729
  • 38
  • 124
  • 207
2

Adding headers to the request worked for me:

req = urllib.request.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7')
response = urllib.request.urlopen(req)
data = response.read()      # a `bytes` object
html = data.decode('utf-8') # a `str`; this step can't be used if data is binary
return html
miguelmorin
  • 5,025
  • 4
  • 29
  • 64
0

myprivateurl.com is not a valid URL. Firefox goes through a number of user-friendly behaviors to guess at what you actually mean, and (depending somewhat on resolver results etc) eventually ends up at something like http://myprivateurl.com/. Requests does not do this; you have to pass in a real, valid URL.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 5
    Since he is getting a 403 error, it is safe to say the problem is not with the URL, otherwise he would get a DNS error. – Burhan Khalid Apr 21 '14 at 15:54