1

I'm trying to programmatically determine the final landing pages of some urls and I ran into http://event.four33.co.kr/20131030/redirect.html which is basically looping back onto itself:

<script type="text/javascript">
    var agent = navigator.userAgent;
    var redirectUrl = "";

    if (agent.indexOf("Windows NT") != -1)
    {
        redirectUrl = "https://play.google.com/store/apps/details?id=com.ftt.suhoji_gl_4kakao";
    }
    else if (agent.indexOf("iPhone") != -1)
    {
        redirectUrl = "https://itunes.apple.com/kr/app/id705181473?mt=8";
    }
    else if (agent.indexOf("iPad") != -1)
    {
        redirectUrl = "https://itunes.apple.com/kr/app//id705181473?mt=8";
    }
    else if (agent.indexOf("Android") != -1)
    {
        redirectUrl = "market://details?id=com.ftt.suhoji_gl_4kakao";
    }
    location.href = redirectUrl;
</script>

When my script (see snippet below) hits it, the driver.current_url doesn't ever return.

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=0, size=(1024, 768))
display.start()
driver=webdriver.Firefox()
driver.get('http://event.four33.co.kr/20131030/redirect.html')
driver.current_url

I tried urllib2 and requests and have not found a way for me to catch this, nor to prevent it. Any tips?

(Note that this url actually looks at the agent accessing it because redirecting. Both FireFox and Chrome aren't "captured" and thus it loops to itself.)

georg
  • 211,518
  • 52
  • 313
  • 390
user918081
  • 65
  • 10

1 Answers1

2

requests can handle that:

try:
    requests.get(looper)
except requests.exceptions.TooManyRedirects:
    do stuff

If you want to detect loops and not just break, you can use code similar to this one:

history = []
while url not in history and len(history) < 42:
    history.append(url)
    r = requests.get(url, allow_redirects=False)
    if 'location' in r.headers:
        url = r.headers['location']
Community
  • 1
  • 1
georg
  • 211,518
  • 52
  • 313
  • 390
  • Thanks georg. I just tried that and it looks like requests.get doesn't realise that there are multiple redirects. I think this is because the redirects are javascript driven? As a result, there are no exceptions thrown. – user918081 Feb 05 '15 at 14:12
  • @user918081: yes, these are js redirects, there's nothing you can do on the server side. – georg Feb 05 '15 at 14:40