Python urllib download only some of a webpage?

Question

I have a program where I need to open many webpages and download information in them. The information, however, is in the middle of the page, and it takes a long time to get to it. Is there a way to have urllib only retrieve x lines? Or, if nothing else, don't load the information afterwards?

I'm using Python 2.7.1 on Mac OS 10.8.2.

Set the [`Range` header](http://stackoverflow.com/a/1971294/1460062) — Richard, Jan 29 '13 at 00:17
I'm guessing that transmission is probably not the gating factor. It's more likely that the server is taking a long time to generate the response. Use Chrome's developer tools to look at what takes time loading the page. If receiving the body is a small amount of time, this will be a pretty minimal optimization. For example, this page took 136ms to get from request to finished receiving on my computer. Of that, only 20ms was transmission, the rest was waiting on network latency and page generation time. — Lucas Wiman, Jan 29 '13 at 00:22

score 2 · Accepted Answer · answered Jan 29 '13 at 00:14

2

The returned object is a file-like object, and you can use .readline() to only read a partial response:

resp = urllib.urlopen(url)
for i in range(10):
    line = resp.readline()

would read only 10 lines, for example. Note that this won't guarantee a faster response.

answered Jan 29 '13 at 00:14

Martijn Pieters

1,048,767
296
4,058
3,343

But I could, for example, do `for i in range(10, 20)` to get lines farther down? – JShoe Jan 29 '13 at 00:18

Python urllib download only some of a webpage?

1 Answers1