1

I have made this simple download manager, but the problem is it wont work on complex urls, when pages are redirected.

def str(d):
    for i in range(len(d)):
        if d[-i] == '/':
            x=-i
            break
    s=[]
    l=len(d)+x+1
    print d[l],d[len(d)-1]

    s=d[l:]


    return s

import urllib2
url=raw_input()
filename=str(url)
webfile = urllib2.urlopen(url)
data = webfile.read()
fout =open(filename,"w")
fout.write(data)
fout.close()
webfile.close()

it wouldn't work for http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=9&ved=0CG0QFjAI&url=http%3A%2F%2Fwww.iasted.org%2Fconferences%2Fformatting%2FPresentations-Tips.ppt&ei=clfWTpjZEIblrAfC8qWXDg&usg=AFQjCNEIgqx6x4ULHFXzzYDzCITuUJOczA&sig2=0VtKXPvoDnIq-lIR4S9LEQ

while it would work for http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt

and both links are for the same file.

How to solve the problem of redirection?

juliomalegria
  • 24,229
  • 14
  • 73
  • 89

2 Answers2

1

I think redirection is not a problem here: Since urllib2 already follows redirect automatically, google redirects to a page in case of error.

Try this script :

url1 = 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=9&ved=0CG0QFjAI&url=http%3A%2F%2Fwww.iasted.org%2Fconferences%2Fformatting%2FPresentations-Tips.ppt&ei=clfWTpjZEIblrAfC8qWXDg&usg=AFQjCNEIgqx6x4ULHFXzzYDzCITuUJOczA&sig2=0VtKXPvoDnIq-lIR4S9LEQ'

url2 = 'http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt'

from urlparse import urlsplit
from urllib2 import urlopen

for url in [url1, url2]:
    split = urlsplit(url)
    filename =  split.path[split.path.rfind('/')+1:]
    if not filename:
        filename = split.query[split.query.rfind('/')+1:]
    f = open(filename, 'w')
    f.write(urlopen(url).read())
    f.close()

    # Yields 2 files : url and Presentations-Tips.ppt [Both are ppt files]

The above script works every time.

Yugal Jindle
  • 44,057
  • 43
  • 129
  • 197
0

In general, you handle redirection by using urllib2.HTTPRedirectHandler, like this:

import urllib2

opener = urllib.build_opener(urllib2.HTTPRedirectHandler)
res = open.open('http://example.com/some/url/')

However, it doesn't like like this will work for the Google URL you've given in your example, because rather than including a Location header in the response, the Google result looks like this:

<script>window.googleJavaScriptRedirect=1</script><script>var a=parent,b=parent.google,c=location;if(a!=window&&b){if(b.r){b.r=0;a.location.href="http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt";c.replace("about:blank");}}else{c.replace("http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt");};</script><noscript><META http-equiv="refresh" content="0;URL='http://www.iasted.org/conferences/formatting/Presentations-Tips.ppt'"></noscript>

...which is to say, it uses a Javascript redirect, which substantially complicates your life. You could use Python's re module to extract the correct location from this block.

larsks
  • 277,717
  • 41
  • 399
  • 399