0

I am trying to update my webscraping scripts as the site (https://covid19.gov.vn/) have updated but I can't for my life found out how to parse these elements. Inspecting the elements it seems like it is there as usual but I cannot parse it with BeautifulSoup. My initial attempts include using Playwright and tried again but I still couldn't scrape it correctly. Viewing the source it's almost like the elements is not there at all. Can anyone with more knowledge about HTML and webscraping explain to me how this works? I'm pretty much stuck here

This is basically my last attempt before I gave up looking at the page source:

from bs4 import BeautifulSoup as bs
import requests
from playwright.sync_api import sync_playwright


with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://covid19.gov.vn/")
    page_content = page.content()
    soup = bs(page_content, features="lxml")
    test = soup.findAll('div', class_ = "content-tab show", id="vi")
    print(test)
    browser.close()

My idea was to scrape and just iterate through all the content inside. But well, it doesn't work. Much appreciated if anyone can help me with this! Thanks!

schlong
  • 3
  • 2

1 Answers1

0

Try the code below - it is based on HTTP GET call that fetch the data you are looking for.

import requests

r = requests.get('https://static.pipezero.com/covid/data.json')
if r.status_code == 200:
    data = r.json()
    print(data['total']['internal'])

output

{'death': 17545, 'treating': 27876, 'cases': 707436, 'recovered': 475343}
balderman
  • 22,927
  • 7
  • 34
  • 52