1

How to extract text from divs in Selenium using Python when new divs are added every approx 1 second?

Based on the above answer, I have the following code:

from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver

chrome_path = r"C:\scrape\chromedriver.exe"

driver = webdriver.Chrome(chrome_path)
driver.get("https://website.com/")
# Get current divs
messages = driver.find_elements_by_class_name('div_i_am_targeting')
# Print all messages
for message in messages:
    print(message.text)

while True:
    try:
        # Wait up to minute for new message to appear
        wait(driver, 60).until(lambda driver: driver.find_elements_by_class_name('div_i_am_targeting') != messages)
        # Print new message
        for message in [m.text for m in driver.find_elements_by_class_name('div_i_am_targeting') if m not in messages]:
            print(message)
        # Update list of messages
        messages = driver.find_elements_by_class_name('div_i_am_targeting')
    except:
        # Break the loop in case no new messages after minute passed
        print('No new messages')
        break

Which works fine and captures all divs on the page as they appear, that match the class specified by div_i_am_targeting

The divs on this HTML page are generated dynamically and one div appears about once every second.

The actual structure on the page is like this:

<div class="div_i_am_targeting">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>
<div class="some_other_div">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>
<div class="yet_another_div">
...
...
</div>
<div class="div_i_am_targeting">
...
...
</div>

Such that, in the dynamically created content there are other divs appearing between the div I am currently targeting.

The frequency of divs on the page is variable.

I couldn't find any related questions here, or examples in the documentation.

How can I modify the above code, so that it scrapes the value of more than one div, e.g. if I want to scrape all instances of div_i_am_targeting and some_other_div in the above example?

Gary
  • 1,086
  • 2
  • 13
  • 39

1 Answers1

0

You can try to replace

driver.find_elements_by_class_name('div_i_am_targeting')

with

driver.find_elements_by_css_selector('.div_i_am_targeting, .some_other_div')

in your script to match both divs

JaSON
  • 4,843
  • 2
  • 8
  • 15
  • Thanks for suggestion. I tried this, but received an error message: "selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: Compound class names not permitted" – Gary Dec 04 '18 at 21:49
  • @Gary , that's because you're still using `find_elements_by_class_name` while you should use `find_elements_by_css_selector` – JaSON Dec 04 '18 at 21:50