2

I would like to get all the questions and comments with a particular tag contained on the Stack Overflow site.
With the use of the API, I managed to make a simple call but I would like to know how I could scroll through the pages to get all the data even from different years.

I'm trying to do that with questions tagged python.
For example, this link returns all questions from 1st July, 2019 to 5th July, 2019 with tag python:

https://api.stackexchange.com/2.2/questions?fromdate=1561939200&todate=1562284800&order=desc&sort=activity&tagged=python&site=stackoverflow

But if I wanted all data from 2015 to 2019, could I include a code like this one?

?page=10

Where could I put it?

Brock Adams
  • 90,639
  • 22
  • 233
  • 295
HABLOH
  • 460
  • 2
  • 12

1 Answers1

3

There are 845 thousand python questions, from 2015 to 2019 (so far).
That's 8,454 pages of API requests -- which is dangerously close to your max quota.
Additionally, trying to fetch that many pages at once is likely to trigger throttling or bugs.

So, it would be better if you downloaded the Data Dump, or ran paged queries against the Stack Exchange Data Explorer (SEDE) for the bulk of your data. Then just use the API to get the changes since the last dump or SEDE update.

Both subjects that are beyond scope for this question. (And also addressed before in other posts).

To answer your direct question, you would page through the results like so:

  1. Fetch: 2.2/questions?page=1&pagesize=100&fromdate=1420070400&order=desc&sort=creation&tagged=python&site=stackoverflow
  2. Then: 2.2/questions?page=2&pagesize=100&fromdate=1420070400&order=desc&sort=creation&tagged=python&site=stackoverflow
  3. etc.
  4. You can also loop until has_more(Doc) is false.
Brock Adams
  • 90,639
  • 22
  • 233
  • 295