0

I have a bit of unrefactored code, in which if the download of a file is requested in a single chunk the I use requests module to download but since I couldn't figure out how to split the chunks if requested split into ranges(irange in this case) using multiple threads how to achieve the same using requests module.

def _grabAndWriteToDisk(url, saveTo, first=None, queue=None, mode='wb', irange=None):
    """ Function to download file using requests when not multipart range,
        else uses urllib2 when multi threaded multi part file download requests.

        Args:
            url(str): url of file to download
            saveTo(str): path where to save file
            first(int): starting byte of the range
            queue(Queue.Queue): queue object to set status for file download
            mode(str): mode of file to be downloaded
            irange(str): range of byte to download

    """
    fileName = url.split('/')[-1]
    filePath = os.path.join(saveTo, fileName)
    fileSize = int(_fdUtils.getUrlSizeInBytes(url))
    downloadedFileSize = 0 if not first else first
    block_sz = 8192

    def statusSet(que, dlfs, fileSize, fName, savTo):
        STOP_REQUEST.set()
        if que:
            que.task_done()
        _log.info("Download Completed %s%% for file %s, saved to %s",
            dlfs * 100. / fileSize, fName, savTo)

    if not irange:
        resp = requests.get(url, stream=True)
        for fileBuffer in resp.iter_content(block_sz):
            if not fileBuffer:
                break

            with open(filePath, mode) as fd:
                downloadedFileSize += len(fileBuffer)
                fd.write(fileBuffer)
                status = r"%10d  [%3.2f%%]" % (downloadedFileSize, downloadedFileSize * 100. / fileSize)
                status = status + chr(8)*(len(status)+1)
                sys.stdout.write('%s\r' % status)
                time.sleep(.05)
                sys.stdout.flush()
                if downloadedFileSize == fileSize:
                    statusSet(queue, downloadedFileSize, fileSize, fileName, saveTo)

    else:
        req = urllib2.Request(url)
        req.headers['Range'] = 'bytes=%s' % irange

        urlFh = urllib2.urlopen(req)
        with open(filePath, mode) as fh:
            while not STOP_REQUEST.isSet():
                fileBuffer = urlFh.read(block_sz)
                if not fileBuffer:
                    break
                downloadedFileSize += len(fileBuffer)
                fh.write(fileBuffer)

                status = r"%10d  [%3.2f%%]" % (downloadedFileSize, downloadedFileSize * 100. / fileSize)
                status = status + chr(8)*(len(status)+1)
                sys.stdout.write('%s\r' % status)
                time.sleep(.05)
                sys.stdout.flush()
                if downloadedFileSize == fileSize:
                    statusSet(queue, downloadedFileSize, fileSize, fileName, saveTo)

how can I reuse requests module to split the range header as done using urllib2 ?

Ciasto piekarz
  • 7,853
  • 18
  • 101
  • 197
  • What part are you struggling with? It looks like all you need to do is add the `Range` header to the `request.get()` call, which happens to accept a `headers` argument (make it dictionary). – Martijn Pieters Jul 18 '14 at 16:45
  • @MartijnPieters Did you mean using `resp.headers['content-length'] ` ? where resp is `resp = requests.get(url, stream=True)` – Ciasto piekarz Jul 19 '14 at 08:30
  • 1
    No, I meant you can *add* headers to the outgoing request with `requests.get(url, headers={'Range': 'bytes=%s' % irange}, stream=True)`. – Martijn Pieters Jul 19 '14 at 10:01
  • aahhh. so which means I can set it afterwards like `resp.headers['Range'] = "bytes=%s" % irange` – Ciasto piekarz Jul 19 '14 at 10:21
  • 1
    No, your original code sets the header on the *request*, altering headers on the returned response makes little sense. – Martijn Pieters Jul 19 '14 at 11:27
  • Cool, understood, thanks however how can i have multiple chunks can be grabbed ? I have posted an attempt question here http://stackoverflow.com/q/24839527/1622400 – Ciasto piekarz Jul 19 '14 at 13:08
  • Does your code work with `urllib2` already? – Martijn Pieters Jul 19 '14 at 13:09
  • @MartijnPieters I was recommended by someone on SO to avoid using `urllib2` since it had some bug see here: http://codereview.stackexchange.com/a/56859 but since I am learning at the moment, I am curious to know the proper approach for doing this . – Ciasto piekarz Jul 19 '14 at 14:30
  • You misunderstood; it is not `urllib2` that has the bug; it is the `getaddrinfo` call that has the problem *on certain platforms*. Using `requests` will not protect you from that problem. – Martijn Pieters Jul 19 '14 at 15:11
  • so `urllib2` uses `getaddrinfo` but of which package ? – Ciasto piekarz Jul 19 '14 at 16:41
  • https://docs.python.org/2/library/socket.html#socket.getaddrinfo – Martijn Pieters Jul 19 '14 at 16:42

0 Answers0