1

Currently I'm working on a Machine Learning Project which analyzes questions on Stack Overflow. I imported requests library and used it to retrieve to questions as follows

import requests
data=requests.get("https://stackoverflow.com/questions")

I expected to retrieve data as a JSON but I got it as HTML. How to retrieve questions on Stack Overflow as JSON?

double-beep
  • 5,031
  • 17
  • 33
  • 41

1 Answers1

0

A simple/quick solution would be.

import json
import requests

response = requests.get("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow")
print(response.text)

Here's the documentation for the api.There you can find more about getting a key and much more.

What you did, was only to get the html of the https://stackoverflow.com/questions, while what you need is access to the API (to get the desired json).

andreis11
  • 1,133
  • 1
  • 6
  • 10
  • Is there is a limitation for a unknown user to get data –  Apr 06 '20 at 22:03
  • 1
    [Here](https://api.stackexchange.com/docs/throttle) Here's more info. Unknown (not have an access_token) share a quota with other unknown apps. – andreis11 Apr 06 '20 at 22:09
  • And is the maximum data sets per one request is 30? –  Apr 06 '20 at 22:13
  • 1
    Yes, 30 request/sec per IP. `After that, applications are sorted into two distinct throttles. Those with, and those without, valid access_tokens (obtained via authenticating a user).` – andreis11 Apr 06 '20 at 22:15
  • 1
    You have an API quota of 300 (requests that can be done per day per IP) and 10000 (!) if you have an API key. – double-beep Apr 07 '20 at 07:42