Using urllib2 to fetch plain text, result isn't full

Question

I'm writing a python script to parse jenkins job results. I'm using urllib2 to fetch consoleText, but the file that I receive isn't full. The code to fetch the file is:

data = urllib2.urlopen('http://<server>/job/<jobname>/<buildid>/consoleText')
lines = data.readlines()

And the number of lines I get is 2306, while the actual number of lines in the console log is 37521. I can check that buy fetching the file via wget:

$ wget 'http://<server>/job/<jobname>/<buildid>/consoleText'
$ wc -l consoleText
37521

Why does urlopen not give me the full result?

UPDATE:

Using requests (as suggested by @svrist) instead of urllib2 doesn't have such a problem, so I'm switching to it. My new code is:

data = requests.get('http://<server>/job/<jobname>/<buildid>/consoleText')
lines = [l for l in data.iter_lines()]

But I still have no idea why urllib2.urlopen doesn't work properly.

Do you think there might be a maximum number of items you can request? — Lisa, Feb 11 '16 at 10:15

score 1 · Accepted Answer · edited May 23 '17 at 11:50

1

The Jenkins build log is returned using a chunked encoding response.

Transfer-Encoding: chunked

Based on a couple of other questions, it seems like urllib2 does not handle the entire response and as you've observed, only returns the first chunk.

I also recommend using the requests package.

edited May 23 '17 at 11:50

Community

1
1

answered Feb 11 '16 at 17:27

Dave Bacher

15,652
3
63
86

Using urllib2 to fetch plain text, result isn't full

1 Answers1