22

I'm using the urllib2.urlopen method to open a URL and fetch the markup of a webpage. Some of these sites redirect me using the 301/302 redirects. I would like to know the final URL that I've been redirected to. How can I get this?

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
Mridang Agarwalla
  • 43,201
  • 71
  • 221
  • 382

4 Answers4

38

Call the .geturl() method of the file object returned. Per the urllib2 docs:

geturl() — return the URL of the resource retrieved, commonly used to determine if a redirect was followed

Example:

import urllib2
response = urllib2.urlopen('http://tinyurl.com/5b2su2')
response.geturl() # 'http://stackoverflow.com/'
Mark Amery
  • 143,130
  • 81
  • 406
  • 459
mmmmmm
  • 32,227
  • 27
  • 88
  • 117
4

The return value of urllib2.urlopen has a geturl() method which should return the actual (i.e. last redirect) url.

Michael
  • 8,920
  • 3
  • 38
  • 56
1

e.g.: urllib2.urlopen('ORIGINAL LINK').geturl()

urllib2.urlopen(urllib2.Request('ORIGINAL LINK')).geturl()

kevin
  • 1,107
  • 1
  • 13
  • 17
-1

You can use HttpLib2 with follow_all_redirects = True and get the content-location from the response headers. See my answer to 'httplib is not getting all the redirect codes' for an example.

Community
  • 1
  • 1
Bengt
  • 14,011
  • 7
  • 48
  • 66