7

I am downloading files over http and displaying the progress using urllib and the following code - which works fine:

import sys
from urllib import urlretrieve

urlretrieve('http://example.com/file.zip', '/tmp/localfile', reporthook=dlProgress)

def dlProgress(count, blockSize, totalSize):
  percent = int(count*blockSize*100/totalSize)
  sys.stdout.write("\r" + "progress" + "...%d%%" % percent)
  sys.stdout.flush()

Now I would also like to restart the download if it is going too slow (say less than 1MB in 15 seconds). How can I achieve this?

Holy Mackerel
  • 3,259
  • 1
  • 25
  • 41
  • 1
    You could raise an Exception in your reporthook. – Tobias Aug 23 '12 at 13:33
  • 1
    Yeah, raising an exception seems to be the popular way to stop downloading, from a quick look at Google. It's not mentioned in the documentation though, which makes me worry that it could have unexpected behavior. For example, maybe the data is fetched by a dedicated thread, and throwing an exception will make it an orphan and not actually stop the download. – Kevin Aug 23 '12 at 14:43

3 Answers3

4

This should work. It calculates the actual download rate and aborts if it is too low.

import sys
from urllib import urlretrieve
import time

url = "http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz" # 14.135.620 Byte
startTime = time.time()

class TooSlowException(Exception):
    pass

def convertBToMb(bytes):
    """converts Bytes to Megabytes"""
    bytes = float(bytes)
    megabytes = bytes / 1048576
    return megabytes


def dlProgress(count, blockSize, totalSize):
    global startTime

    alreadyLoaded = count*blockSize
    timePassed = time.time() - startTime
    transferRate = convertBToMb(alreadyLoaded) / timePassed # mbytes per second
    transferRate *= 60 # mbytes per minute

    percent = int(alreadyLoaded*100/totalSize)
    sys.stdout.write("\r" + "progress" + "...%d%%" % percent)
    sys.stdout.flush()

    if transferRate < 4 and timePassed > 2: # download will be slow at the beginning, hence wait 2 seconds
        print "\ndownload too slow! retrying..."
        time.sleep(1) # let's not hammer the server
        raise TooSlowException

def main():
    try:
        urlretrieve(url, '/tmp/localfile', reporthook=dlProgress)

    except TooSlowException:
        global startTime
        startTime = time.time()
        main()

if __name__ == "__main__":
    main()
Codetoffel
  • 2,743
  • 2
  • 21
  • 14
  • Note that this will only work in the case of a slowing connection. The more usual dropped connection will not work unless you add a timeout to the socket. Otherwise -- OK! +1 – the wolf Aug 24 '12 at 00:43
3

Something like this:

class Timeout(Exception): 
    pass 

def try_one(func,t=3):
    def timeout_handler(signum, frame):
        raise Timeout()

    old_handler = signal.signal(signal.SIGALRM, timeout_handler) 
    signal.alarm(t) # triger alarm in 3 seconds

    try: 
        t1=time.clock()
        func()
        t2=time.clock()

    except Timeout:
        print('{} timed out after {} seconds'.format(func.__name__,t))
        return None
    finally:
        signal.signal(signal.SIGALRM, old_handler) 

    signal.alarm(0)
    return t2-t1

The call 'try_one' with the func you want to time out and the time to timeout:

try_one(downloader,15)

OR, you can do this:

import socket
socket.setdefaulttimeout(15)
the wolf
  • 34,510
  • 13
  • 53
  • 71
  • 1
    This is a good solution if you're downloading small files of known size. If you don't know the size ahead of time, you won't know how many seconds to pass to `try_one`. And if you're downloading a 100MB file, `try_one(downloader, 1500)` won't give up until 1500 seconds have elapsed. Preferably, it would quit as soon as it was confident that the download won't finish in time. – Kevin Aug 23 '12 at 13:36
  • Yes, agreed. Thanks for the solution but I would like to cancel based on minimum throughput threshold not on whether the download has completed within a certain timeout. – Holy Mackerel Aug 23 '12 at 13:56
  • @HolyMackerel: Just modify your report hook to have a Timeout at say 10 second intervals and check the rate. The problem is a hung download where 0 bytes are xfered and your report hook is never called. – the wolf Aug 23 '12 at 14:45
0

HolyMackerel! Use the tools!

import urllib2, sys, socket, time, os

def url_tester(url = "http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz"):
    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url,None,1)     # Note the timeout to urllib2...
    file_size = int(u.info().getheaders("Content-Length")[0])
    print ("\nDownloading: {} Bytes: {:,}".format(file_name, file_size))

    with open(file_name, 'wb') as f:    
        file_size_dl = 0
        block_sz = 1024*4
        time_outs=0
        while True:    
            try:
                buffer = u.read(block_sz)
            except socket.timeout:
                if time_outs > 3:   # file has not had activity in max seconds...
                    print "\n\n\nsorry -- try back later"
                    os.unlink(file_name)
                    raise
                else:              # start counting time outs...
                    print "\nHmmm... little issue... I'll wait a couple of seconds"
                    time.sleep(3)
                    time_outs+=1
                    continue

            if not buffer:   # end of the download             
                sys.stdout.write('\rDone!'+' '*len(status)+'\n\n')
                sys.stdout.flush()
                break

            file_size_dl += len(buffer)
            f.write(buffer)
            status = '{:20,} Bytes [{:.2%}] received'.format(file_size_dl, 
                                           file_size_dl * 1.0 / file_size)
            sys.stdout.write('\r'+status)
            sys.stdout.flush()

    return file_name 

This prints a status as expected. If I unplug my ethernet cable, I get:

 Downloading: Python-2.7.3.tgz Bytes: 14,135,620
             827,392 Bytes [5.85%] received


sorry -- try back later

If I unplug the cable, then plug it back in in less than 12 seconds, I get:

Downloading: Python-2.7.3.tgz Bytes: 14,135,620
             716,800 Bytes [5.07%] received
Hmmm... little issue... I'll wait a couple of seconds

Hmmm... little issue... I'll wait a couple of seconds
Done! 

The file is successfully downloaded.

You can see that urllib2 supports both timeouts and reconnects. If you disconnect and stay disconnected for 3 * 4 seconds == 12 seconds, it will timeout for good and raise a fatal exception. This could be dealt with as well.

dawg
  • 98,345
  • 23
  • 131
  • 206