0

The Stack Exchange API returns only 30 items per request. I used a for loop to call the stack Exchange API like given below to get 4500 records.

import requests
complete_data=[]
for i in range (150):
    response = requests.get("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow")
    newData=json.loads(response.text)
    for item in newData['items']:
        complete_data.append(item)

But while analyzing the questions I got from the API, there was same data sets which was received 150 times. So I have received same data set for each data request in the code. I need to have near 5000 records to analyze data. Can anyone show me what changes should I do in my code?

double-beep
  • 5,031
  • 17
  • 33
  • 41
SjAnupa
  • 102
  • 10
  • This question belongs on Meta Stack Exchange. – Tim Biegeleisen Apr 16 '20 at 02:28
  • 1
    Can I have a link for that? – SjAnupa Apr 16 '20 at 02:45
  • Add a `&pagesize=100` parameter (max, will return 100 items). Default is 30, as you realized and minimum is 1. In addition, you should also send a `&page=` parameter which should be equal to `i+1`. You are fetching the same page 150 times, currently! (Note: for 4.5k questions, you need `for i in range (45)`). – double-beep Apr 16 '20 at 06:12

1 Answers1

2

You're actually fetching 30 items per request and the same page (the first one). Define pagesize (max 100, min 1) and page (i + 1) in order to solve the problem:

import requests
import time

complete_data=[]
for i in range (45):
    response = requests.get("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow&pagesize=100&page=" + str(i + 1))
    newData=json.loads(response.text)
    for item in newData['items']:
        complete_data.append(item)
    print("Processed page " + str(i + 1) + ", returned " + str(response))
    time.sleep(2) # timeout not to be rate-limited

Notes:

  • Timeout for 2 seconds added to prevent rate-limiting.
  • You may want to obtain an API key to increase your quota from 300 to 10000.
double-beep
  • 5,031
  • 17
  • 33
  • 41