After searching 100s of answers, I'm here again, asking new question that might help someone in the future.
I'm scraping this website: https://inview.doe.in.gov/state/1088000000/school-list.
The school list is in a flex box and I believe that I can get the data fetched by using selenium. But I want get this job done only by using BeautifulSoup.
By inspecting and tracking the Network connections, I found 2 API calls and I'm not which API gives me the school list. I do have their IPv4 address as well.
api = 'https://inview.doe.in.gov/api/entities?lang=en&merges=[{"route": "entities", "name": "district", "local_field": "district_id", "foreign_field": "id", "fields": "id,name"}]&filter=state_id==1088000000'
ipv4 = '104.18.21.238:443'
api2 = 'https://inview.doe.in.gov/api/entities?filter=type==district,type==network,type==school,type==state&fields=name,type,id,district_id'
ipv4 = '104.18.21.238:443'
Trying to access the content directly gives None as it is dynamaically loaded (at least that's what I believe).
import json
import requests
from bs4 import BeautifulSoup
def url_parser(url):
html_doc = requests.get(url, headers={"Accept":"*/*"}).text
soup = BeautifulSoup(html_doc,'html.parser')
return html_doc, soup
def data_fetch(url):
html_doc, soup = url_parser(url)
api_link = 'https://inview.doe.in.gov/api/entities?lang=en&merges=[{"route": "entities", "name": "district", "local_field": "district_id", "foreign_field": "id", "fields": "id,name"}]&filter=state_id==1088000000'
html_doc2, soup2 = url_parser(api_link)
#school_id = soup2.find_all('div', {'class':'result-table table--results mt-3'})
print(soup2)
def main():
url = "https://inview.doe.in.gov/state/1088000000/school-list"
data_fetch(url)
main()
Trying to open the api link directly gives me the same error message as what I get in the code as below:
{"message":"The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request. Supported entities are: application/json, application/vnd.tembo.api+json, application/vnd.tembo.api+json;version=1","status":406}
Is there any way I can fix that?