How can I apply iter() to a pagination api?

Question

I watched Raymond Hettinger's Idiomatic Python talk, and learned about the sentinel argument to iter(). I'd like to try to apply it to a piece of code I'm working on iterating over an API that uses pagination (it's Twilio, but not relevant to my question).

I have an API that returns: a list of data, and a next page URL. When the pagination is exhausted, the next page URL returns as an empty string. I wrote the fetching function as a generator and looks roughly like this:

def fetch(url):
    while url:
        data = requests.get(url).json()
        url = data['next_page_uri']
        for row in data[resource]:
            yield row

This code works fine, but I'd like to try to remove the while loop and replace it with a call to iter() using the next_page_uri value as the sentinel argument. Alternately, could this be written with a yield from?

You can’t use `iter` on *part* of a value. You could write an adapter generator, but it’d be just as complicated as what you have now. — Davis Herring, Dec 23 '18 at 16:24
It's worth noting that python3.6 throws a warning for the example version of the function. In my actual code the inside of the `while` loop is in a `try`-`except to catch `StopIteration`. — Sethish, Dec 23 '18 at 16:30
You want it to *yield* rows till the `'next_page_uri'` is an empty string? Does it need to make a new request for each *next_page*.? — wwii, Dec 23 '18 at 16:36
Yes, as written currently, it makes a new request each time it exhausts `data[resource]` and then continues to yield additional rows from the next `request.get`. — Sethish, Dec 23 '18 at 16:41
https://www.python.org/dev/peps/pep-0479/#consequences-for-existing-code I'm not 100% clear on the mechanics of `StopIteration`. But the above code will generating a warning in 3.6 unless you wrap it and catch `StopIteration`. The PEP states that this is the preferred way to make a generator with a while loop, unless I'm misunderstanding it. — Sethish, Dec 23 '18 at 17:01
Why do you need to implement this using `iter` with a sentinel? — wwii, Dec 23 '18 at 18:04
the `for` loop will catch the `StopIteration` exception for you, that's what terminates the loop… no need to catch it yourself — Sam Mason, Dec 23 '18 at 22:20
@sammason I mis-remembered when the deprecation warning was being thrown. It only came up in my tests when I exhaust the generator with `list()` not when I use the generator in `dict_writer.writerows()`. — Sethish, Dec 24 '18 at 02:48

score 1 · Accepted Answer · answered Dec 23 '18 at 22:32

I think this might be what you mean… but as stated in the comments, it doesn't help much:

def fetch_paged(url):
    while url:
        res = requests.get(url)
        res.raise_for_status()
        data = res.json()
        yield data
        url = data['next_page_uri']

def fetch(url):
    for data in fetch_paged(url):
        yield from data[resource]

(I've taken the opportunity to put in a call to raise_for_status() which will raise for non-successful, i.e. res.status_code < 400, responses)

not sure if it's any "better", but possibly if you're going to be reusing the fetch_paged functionality a lot

Note: lots of other APIs put this next_page_uri into the response headers in standard ways which the requests library knows how to deal with and exposes via the res.links attribute

Thanks! You're right, this type of pagination doesn't quite fit the idiom that I linked in the question. I tried `res.links` but alas, the Twilio api doesn't send their values that way: ```ipdb> data ipdb> data.links {}``` — Sethish, Dec 24 '18 at 02:15

How can I apply iter() to a pagination api?

1 Answers1