0

I have a simple script to download a file of size 300MB from a remote server in python using requests library. I am thinking about decreasing the download time using threading module and Queue module and and I am trying to figure out how to use these modules

I thought about having 4 threads and split the file size into four chunks and each thread. My requests download code looks like this now.

import threading
import Queue
import requests

queue = Queue.Queue()

class downloadThread(threading.Thread):

  def __init__(self,queue):
      self.inQueue = queue

  def run():
      while True:
           url = self.inQueue.get()
           resp = requests.get(url,stream=True)

           with open('/tmp/app.zip','wb') as f:
                for chunk in resp.iter_contents(chunk_size=1024):
                     if chunk:
                          f.write(chunk)
                          f.flush()

if __name__ == '__main__':

     for x in range(3):
       t = downloadThread(queue)
       t.setDaemon(True)
       t.start()

 queue.put(http://urltofile')

When using iter_contents to retrieve content, I can provide the chunk_size but is there a way to mention start from 1024 and handle chunk till 2048? Since I am planning to use thread 1 to download 0 to 1024 and thread 2 to handle 1025 to 2048 and so on.

I have to handle the writing the file in a different logic. I am planning to reading the chunks into another queue and inQueue and then write them into a file. As of now I am trying to figure out how to split the file chunks between threads.

Thanks

slysid
  • 5,236
  • 7
  • 36
  • 59
  • You can't download chunks from a stream in different threads like that. You'd have to make range requests (separate HTTP requests, and only if the server supports HTTP-range). – Martijn Pieters Sep 30 '15 at 11:41
  • The *bottleneck* here is the network connection, not the processing of the download. – Martijn Pieters Sep 30 '15 at 11:43
  • according to http://stackoverflow.com/questions/24970989/want-to-read-from-a-particular-offset-of-a-file-from-internet-using-python3 you should add a 'range' header to the request – yurib Sep 30 '15 at 11:43
  • Note that I would not expect the download to be all that much faster unless you have plenty of bandwidth all the way to the server and the server throttles download speeds per connection. – Martijn Pieters Sep 30 '15 at 12:34
  • I've re-duplicated you to a better post; the `urllib2` code there can be fairly easily ported to `requests`. – Martijn Pieters Sep 30 '15 at 12:43

0 Answers0