4

I'm trying to download files using python requests. It worked in python 2.7, but not now. I'm really confused and there has to be a simpler answer. Since the files can be quite large, I really want a progressbar, and I'm using python procressbar to do the job.

                r = requests.get(file_url, data={'track': 'requests'})
                size = int(r.headers['Content-Length'].strip())
                self.bytes = 0
                widgets = [name, ": ", Bar(marker="|", left="[", right=" "),
                    Percentage(), " ",  FileTransferSpeed(), "] ",
                    self,
                    " of {0}MB".format(round(size / 1024 / 1024, 2))]
                pbar = ProgressBar(widgets=widgets, maxval=size)
                pbar.start()

                file = b""
                for chunk in r.iter_content()
                    if chunk:
                        file += chunk

                        self.bytes += 1
                        pbar.update(self.bytes)

I found that using iter_content was the best way to get a continous update. I did try iter_lines but it messed up the files. It stops downloading all of a sudden and is really slow, it takes 15 minutes to download 10% after which it stops. And trying to open a file in byte mode and writing to it doesn't work, it doesn't throw an error at all. And when I try to print what the chunk contains using

print(chunk.decode("utf-8")

Works, but only a few characters. At some point it complains about

UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

Even using "decode_unicode=True" in iter_content does nothing. I'm stumped and don't know what to do. It shouldn't be this hard to use Py3k.

thabubble
  • 640
  • 1
  • 8
  • 17

1 Answers1

4

Managed to fix it. So here's the updated piece of code:

r = requests.get(file_url)
size = int(r.headers['Content-Length'].strip())
self.bytes = 0
widgets = [name, ": ", Bar(marker="|", left="[", right=" "),
    Percentage(), " ",  FileTransferSpeed(), "] ",
    self,
    " of {0}MB".format(str(round(size / 1024 / 1024, 2))[:4])]
pbar = ProgressBar(widgets=widgets, maxval=size).start()
file = []
for buf in r.iter_content(1024):
    if buf:
        file.append(buf)
        self.bytes += len(buf)
        pbar.update(self.bytes)
pbar.finish()

The download speed changed from 7kb/s to 400+ kb/s. And it is fully working.

thabubble
  • 640
  • 1
  • 8
  • 17
  • Wouldn't this actually be a progress bar of it writing it to the file, seeing as the request is done when you do the requests.get() call? – antihero Oct 16 '12 at 19:02
  • The requests.get() function returns a Response object. It doesn't really do anything unless you do something with it(like r = requests.get(file_url); r.text). I iteriated over the content when it is downloaded instead. I actually had to save it to a file afterwards :) – thabubble Nov 11 '12 at 12:15
  • @thabubble Actually, you need to call `requests.get(file_url, prefetch=False)` for your statement to be true. Otherwise, @antihero is exactly right. – heycosmo Nov 24 '12 at 11:24
  • Well, then I don't know why it worked. But it did. As far as I can see what I did was "r = self.s.get(file_url, data={'track': 'requests'})" and then iteriate over it using "for char in r.iter_content():" and it definetly works. You can see it download. But this is a long time ago and I might have the wrong file. But it was edited well after I posted my answer. I remember it only would download the header, not the body. For that you would have to use r.text or r.iter_content(). – thabubble Nov 24 '12 at 19:14