Cannot webscrape elements with Playwright and BeautifulSoup

Question

I am trying to update my webscraping scripts as the site (https://covid19.gov.vn/) have updated but I can't for my life found out how to parse these elements. Inspecting the elements it seems like it is there as usual but I cannot parse it with BeautifulSoup. My initial attempts include using Playwright and tried again but I still couldn't scrape it correctly. Viewing the source it's almost like the elements is not there at all. Can anyone with more knowledge about HTML and webscraping explain to me how this works? I'm pretty much stuck here

This is basically my last attempt before I gave up looking at the page source:

from bs4 import BeautifulSoup as bs
import requests
from playwright.sync_api import sync_playwright


with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://covid19.gov.vn/")
    page_content = page.content()
    soup = bs(page_content, features="lxml")
    test = soup.findAll('div', class_ = "content-tab show", id="vi")
    print(test)
    browser.close()

My idea was to scrape and just iterate through all the content inside. But well, it doesn't work. Much appreciated if anyone can help me with this! Thanks!

score 0 · Accepted Answer · answered Sep 21 '21 at 16:27

0

Try the code below - it is based on HTTP GET call that fetch the data you are looking for.

import requests

r = requests.get('https://static.pipezero.com/covid/data.json')
if r.status_code == 200:
    data = r.json()
    print(data['total']['internal'])

output

{'death': 17545, 'treating': 27876, 'cases': 707436, 'recovered': 475343}

answered Sep 21 '21 at 16:27

balderman

22,927
7
34
52

Thanks! I did not know something like that exists. Much appreciated! o_0 – schlong Sep 22 '21 at 13:58

Cannot webscrape elements with Playwright and BeautifulSoup

1 Answers1