My goal is to web scrape a dynamic web page's HTML using Playwright for Python. Within an ordered list, there are multiple list items, and each contains multiple spans, of which one span has a button / link. Once there is a click on the button, it will execute further code and scrape the HTML using BeautifulSoup.
Here is an example of the structure.
<script>
function demoA () { alert("Button clicked"); }
</script>
<h2>Simple list with buttons</h2>
<div class="simlist">
<ol class="list_ord">
<li class="header-section_ord"><span
class="item">Category </span><span
class="item">Count </span></li>
<li class="item_ord"><span>Beginner</span><span>6</span><button
class="item_ord" onclick="demoA()">Information</button></li>
<li class="item_ord"><span>Advanced</span><span>2</span><button
class="item_ord" onclick="demoA()">Information</button></li>
</ol>
</div>
I've been able to get Playwright to click on the button I want, using this Python code.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.firefox.launch()
page = browser.new_page()
page.goto('http://localhost:1234/SimpleListButtons.html')
print("Opened content page")
item_locator = page.locator('li').filter(has_text='Advanced').filter(has=page.get_by_role('button'))
print(item_locator.inner_html())
item_locator.locator('button').click()
print("Button clicked")
My questions:
- With the above code, I'm able to select the button which I want to be clicked using filter criteria. Is there a better way to do it?
- How can one iterate through the list items e.g. in a for loop and click each list item's button (and subsequently do further scraping?