0

I am trying to connect to websites with Python and get the HTTP status codes. As answers on this other question of mine suggest, the reason that HTTP status code for websites such as google.com are 301 or 302 (permanently moved) is that these servers are redirecting. However, I would like to be able to connect to them in such a manner that I get the natural 200 (OK) from them. Here's my current code:

import httplib

conn = httplib.HTTPConnection("google.com", 80)
conn.request("GET","/")
r  = conn.getresponse()
print r.status, r.reason
conn.close()

What do I need to alter/add to achieve this? I heard that pycurl library might help me with that, but googling hasn't brought any useful results so far. I am a novice in this field, so please excuse me if the question is trivial.

Community
  • 1
  • 1
Karen Tsirunyan
  • 1,938
  • 6
  • 19
  • 30

1 Answers1

2

I assume what you want is for your code to follow the 301/302s to the end url which returns a 200?

If so you could try using urllib, or better still use requests which you can install with pip.

Both urllib and more reliably requests should follow 301's and 302's and give you the final page that returns a 200.

Info on the requests module can be found here: http://pypi.python.org/pypi/requests/

Hope this helps.

dan360
  • 361
  • 2
  • 16