How to download chunked data with Pythons urllib2

Question

I'm trying to download a large file from a server with Python 2:

req = urllib2.Request("https://myserver/mylargefile.gz")
rsp = urllib2.urlopen(req)
data = rsp.read()

The server sends data with "Transfer-Encoding: chunked" and I'm only getting some binary data, which cannot be unpacked by gunzip.

Do I have to iterate over multiple read()s? Or multiple requests? If so, how do they have to look like?

Note: I'm trying to solve the problem with only the Python 2 standard library, without additional libraries such as urllib3 or requests. Is this even possible?

score 1 · Answer 1 · answered Jun 24 '14 at 18:43

1

From the python documentation on urllib2.urlopen:

One caveat: the read() method, if the size argument is omitted or negative, may not read until the end of the data stream; there is no good way to determine that the entire stream from a socket has been read in the general case.

So, read the data in a loop:

req = urllib2.Request("https://myserver/mylargefile.gz")
rsp = urllib2.urlopen(req)
data = rsp.read(8192)
while data:
   # .. Do Something ..
   data = rsp.read(8192)

answered Jun 24 '14 at 18:43

jaime

2,234
1
19
22

I'm under the impression, that this works to download only files, that are not sent with transfer-encoding=chunked. – Jun 24 '14 at 19:01
Hmm, you're right. I saw a similar question with no answer: http://stackoverflow.com/questions/15115606/urllib2-python-transfer-encoding-chunked Sorry, I'm not sure how to get past it. The only answer used curl. – jaime Jun 24 '14 at 19:16
OK, I'll try curl, which is a little bit cumbersome with login cookies compared to Python, but better than nothing. Thanks! – Jun 24 '14 at 19:27

score 1 · Answer 2 · answered Jun 24 '14 at 18:43

1

If I'm not mistaken, the following worked for me - a while back:

data = ''
chunk = rsp.read()
while chunk:
    data += chunk
    chunk = rsp.read()

Each read reads one chunk - so keep on reading until nothing more's coming. Don't have documenation ready supporting this...yet.

answered Jun 24 '14 at 18:43

sebastian

9,526
26
54

Unfortunately, this does not work for me: content = '' while True: chunk = rsp.read() if not chunk break content += chunk f.write(content) – Jun 24 '14 at 18:55
`does not work` unfortunately is a very un-helpful statement :) Is there still content missing? – sebastian Jun 25 '14 at 06:09
Sorry: "does not work" in the sense of "exactly like before". I.e. the data is not complete and cannot be read by gunzip. I assume, that urllib2 just does not support chunked transfer-encoding. – Jun 25 '14 at 11:51

score 0 · Answer 3 · answered Oct 24 '19 at 15:21

I have the same problem.

I found that "Transfer-Encoding: chunked" often appears with "Content-Encoding: gzip".

So maybe we can get the compressed content and unzip it.

It works for me.

import urllib2
from StringIO import StringIO
import gzip

req = urllib2.Request(url)
req.add_header('Accept-encoding', 'gzip, deflate')
rsp = urllib2.urlopen(req)
if rsp.info().get('Content-Encoding') == 'gzip':
    buf = StringIO(rsp.read())
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()

How to download chunked data with Pythons urllib2

3 Answers3

Linked