Stream HTTP content but skip downloading some lines at all in Python

Question

Edit- This is partially solved. The exact implementation details are not figured out yet, but the answer it to use HTTP range headers, as in Ezequiel's comment.

In case my explanation is not clear enough, I am trying to replicate the procedure here: https://www.cpc.ncep.noaa.gov/products/wesley/fast_downloading_grib.html in python.

edit: From a friends' kind advice, I've figured out part of the solution. I need to just grab a specific byte range using my get request- that's all that NOAA's PERL scripts are doing.

I'm attempting to download only a few fields from a "GRIB" file- a certain array-like format that the national weather service uses. It is at a specific HTTPS url, e.g. https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20201209/00/gfs.t00z.pgrb2.0p25.f000. But very specifically, I need to only download the lines that are relevant to me- e.g. lines 5, 10, and 30. I'd like to avoid downloading the content of the other lines at all, but I'm not sure about the low-level behavior of the requests library here (or a suitable alternative).

Use a HTTP request [Range header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range) — , Dec 15 '20 at 09:43
@JustinEzequiel - Yep that's right. I meant to update this with the solution but I haven't implemented the python code for it yet, and there are some more details to fill out. — tomaszps, Dec 15 '20 at 21:30

score 0 · Answer 1 · answered Dec 12 '20 at 04:26

0

This should be the code:

req = request.get('https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20201209/00/gfs.t00z.pgrb2.0p25.f000',stream=True)
for line in req.iter_lines():
    next(line)
    x2 = next(line)

answered Dec 12 '20 at 04:26

shekhar chander

600
8
14

I don't mean to be rude but this doesn't answer my question. I'm not asking how to iterate over the lines, I'm asking if there's a way to avoid downloading *some* of the lines as you iterate across the file. I know it's possible because of the site that I linked to. – tomaszps Dec 12 '20 at 19:18
@tomaszps AFAIK they just chunked every day to different files. 20201209 = 2020/1209 of 13 August 2020. You can not skip content unless the web server supports http ranges. – Marat Mkhitaryan Dec 15 '20 at 04:37
@MaratMkhitaryan It does. Haven't updated the question with the solution yet. – tomaszps Dec 15 '20 at 05:24

Stream HTTP content but skip downloading some lines at all in Python

1 Answers1