-2

My goal is to web scrape a dynamic web page's HTML using Playwright for Python. Within an ordered list, there are multiple list items, and each contains multiple spans, of which one span has a button / link. Once there is a click on the button, it will execute further code and scrape the HTML using BeautifulSoup.

Here is an example of the structure.

<script>
    function demoA () { alert("Button clicked"); }
</script>

<h2>Simple list with buttons</h2>
                        <div class="simlist">
                            <ol class="list_ord">
                                <li class="header-section_ord"><span
                                        class="item">Category </span><span
                                        class="item">Count </span></li>
                                <li class="item_ord"><span>Beginner</span><span>6</span><button
                                        class="item_ord" onclick="demoA()">Information</button></li>
                                <li class="item_ord"><span>Advanced</span><span>2</span><button
                                    class="item_ord" onclick="demoA()">Information</button></li>
                            </ol>
                        </div>

I've been able to get Playwright to click on the button I want, using this Python code.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.firefox.launch()
    page = browser.new_page()
    page.goto('http://localhost:1234/SimpleListButtons.html')
    print("Opened content page")
    item_locator = page.locator('li').filter(has_text='Advanced').filter(has=page.get_by_role('button'))
    print(item_locator.inner_html())
    item_locator.locator('button').click()
    print("Button clicked")

My questions:

  1. With the above code, I'm able to select the button which I want to be clicked using filter criteria. Is there a better way to do it?
  2. How can one iterate through the list items e.g. in a for loop and click each list item's button (and subsequently do further scraping?
teflonjon
  • 9
  • 4
  • OK, so maybe at the first attempt my question was not helpful in someone's opinion. However, I find it a bit "immature" at voting down my question, and then just staying quiet. In my opinion, the question shows that I've been trying out on my own and doing research - perhaps others take the same approach? It seems a shame that one cannot really come here to learn. – teflonjon Oct 25 '22 at 14:42

2 Answers2

0

I may be wrong on this but let's say the button you are trying to click has link which will redirect you to another page/site and you want to go to that link. You should take the attribute value of the selector and just navigate there. Check here for more details : https://playwright.dev/python/docs/api/class-page#page-get-attribute

alfah
  • 77
  • 5
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – user11717481 Oct 29 '22 at 19:38
0

I think this should do what you want:

# Import needed libs
import time
from playwright.sync_api import sync_playwright

# We initiate the playwright page
p = sync_playwright().start()
browser = p.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
page.goto('http://localhost:1234/SimpleListButtons.html')

# Count how many buttons
count = page.locator("//div[@class='simlist']//button").count()

# Loop to get the label side to the button and then clicking on it
for i in range(0, count):
    text_item_list_to_be_clicked = page.locator(f"(//div[@class='simlist']//button)[{i+1}]/../span[1]").inner_text()
    print(f"Clicking in button for item {text_item_list_to_be_clicked}")
    page.click(f"(//div[@class='simlist']//button)[{i+1}]")
Jaky Ruby
  • 1,409
  • 1
  • 7
  • 12