4

I watched Raymond Hettinger's Idiomatic Python talk, and learned about the sentinel argument to iter(). I'd like to try to apply it to a piece of code I'm working on iterating over an API that uses pagination (it's Twilio, but not relevant to my question).

I have an API that returns: a list of data, and a next page URL. When the pagination is exhausted, the next page URL returns as an empty string. I wrote the fetching function as a generator and looks roughly like this:

def fetch(url):
    while url:
        data = requests.get(url).json()
        url = data['next_page_uri']
        for row in data[resource]:
            yield row

This code works fine, but I'd like to try to remove the while loop and replace it with a call to iter() using the next_page_uri value as the sentinel argument. Alternately, could this be written with a yield from?

Sethish
  • 364
  • 2
  • 12
  • 1
    You can’t use `iter` on *part* of a value. You could write an adapter generator, but it’d be just as complicated as what you have now. – Davis Herring Dec 23 '18 at 16:24
  • It's worth noting that python3.6 throws a warning for the example version of the function. In my actual code the inside of the `while` loop is in a `try`-`except to catch `StopIteration`. – Sethish Dec 23 '18 at 16:30
  • You want it to *yield* rows till the `'next_page_uri'` is an empty string? Does it need to make a new request for each *next_page*.? – wwii Dec 23 '18 at 16:36
  • Yes, as written currently, it makes a new request each time it exhausts `data[resource]` and then continues to yield additional rows from the next `request.get`. – Sethish Dec 23 '18 at 16:41
  • @Sethish: What here could raise `StopIteration`? – Davis Herring Dec 23 '18 at 16:54
  • https://www.python.org/dev/peps/pep-0479/#consequences-for-existing-code I'm not 100% clear on the mechanics of `StopIteration`. But the above code will generating a warning in 3.6 unless you wrap it and catch `StopIteration`. The PEP states that this is the preferred way to make a generator with a while loop, unless I'm misunderstanding it. – Sethish Dec 23 '18 at 17:01
  • Why do you need to implement this using `iter` with a sentinel? – wwii Dec 23 '18 at 18:04
  • the `for` loop will catch the `StopIteration` exception for you, that's what terminates the loop… no need to catch it yourself – Sam Mason Dec 23 '18 at 22:20
  • @sammason I mis-remembered when the deprecation warning was being thrown. It only came up in my tests when I exhaust the generator with `list()` not when I use the generator in `dict_writer.writerows()`. – Sethish Dec 24 '18 at 02:48

1 Answers1

1

I think this might be what you mean… but as stated in the comments, it doesn't help much:

def fetch_paged(url):
    while url:
        res = requests.get(url)
        res.raise_for_status()
        data = res.json()
        yield data
        url = data['next_page_uri']

def fetch(url):
    for data in fetch_paged(url):
        yield from data[resource]

(I've taken the opportunity to put in a call to raise_for_status() which will raise for non-successful, i.e. res.status_code < 400, responses)

not sure if it's any "better", but possibly if you're going to be reusing the fetch_paged functionality a lot

Note: lots of other APIs put this next_page_uri into the response headers in standard ways which the requests library knows how to deal with and exposes via the res.links attribute

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
  • Thanks! You're right, this type of pagination doesn't quite fit the idiom that I linked in the question. I tried `res.links` but alas, the Twilio api doesn't send their values that way: ```ipdb> data ipdb> data.links {}``` – Sethish Dec 24 '18 at 02:15