1

So , I am trying to download a single file in parts using the Python Threading module and requests to download.

Now the thing is I am able to download the file into four separate parts but not able to join them.

I tried using PyPDF2 to join pdf and ffmpeg to join video files but no help. I am not able to open any of the downloaded files which explains that they are not getting downloaded correctly.

PyPDF2 giving error

PdfReadError: EOF marker not found

What should I do so that that I am able to join the files correctly.
Secondly , do I have to use seperate methods to join them, Or I can implement one method which can be used for different file formats
Below is the download function I implemented.

def download(threadId, drange, url):
    headers = {"Range":"bytes={0}-{1}".format(drange[0], drange[1])}
    print headers

    size = drange[1] - drange[0]
    print "Starting Thread {0}".format(threadId)
    req = requests.get(url, headers=headers, stream=True)

    download_status[size] = size
    download_status[threadId] = 0
    # return req
    with open('test{0}.mp4'.format(threadId), 'wb') as f:
        for r in pr.bar(req.iter_content(chunk_size=2048), expected_size=(size/2048)+1):
            if r:
                f.write(r)
                f.flush()
formatkaka
  • 1,278
  • 3
  • 13
  • 27
  • I think that if you only need to download a file separated in 4 portions, then registering it on your system, if you receive the 4 parts in a binary block, you can simply do open('output', 'ab') for append-binary to write the received part at the end of the already-received file parts. – Artemis Jun 29 '16 at 10:22
  • I tried downloading it by changing `'wb'` to `'ab'` but out of the 4 parts the first part is getting downloaded as mp4 and rest as binary. – formatkaka Jun 30 '16 at 06:03
  • 1
    I agree with @Artemis in that you only need to 'chain' the blocks. Why don't you first try it with some tiny local example (no download involved). Just split a small file manually and try to join it with your code. If the results match, then try to implement the same with a download. You might have a problem with the ranges (the last bit of the range might not be included and you're assuming it is or the opposite). Just use something simple that you can manually check, a random chunk of bytes, a 1x1 image... – martinarroyo Jun 30 '16 at 07:18
  • 1
    When opening the file for reading/sending it over the network, or even on local, don't use some fancy libraries. There are just too much formats available here and you can't just do some spagetti-code for every filetype. Instead open the file using the native `open('file', 'rb') function` then keep using the `'ab'` argument on the receive side. It should work fine. Use @martinarroyo's example before anything else, it may help you see where it actually fails. – Artemis Jun 30 '16 at 19:18

0 Answers0