I am using Python Selenium to try and scrape or obtain data because lxml is so poorly documented with parsing HTML and obtaining data using xpath, and no matter what I try, nothing works with that library.
I am having some success with Selenium like so: (but not always - hence this question)
element = self.driver.find_element_by_xpath(xpath)
print(element.text)
Problem:
If I have a HTML segment like this in a HTML document:
<strong>Address:</strong>
24 some street, CA
<strong>Company:</strong>
ACME Inc.
and I am using Firefox to get the xpath of the data, or a Chrome plugin to get the xpath to '24 some street, CA', I cannot obtain it (neither gives me the xpath to the data).
I can only obtain the xpath of 'Address:' but I don't need that, I need the data after the closing </strong>
tag.
The xpath to the text 'Address:' might be something like:
/html/body/div[2]/div[4]/div[1]/span/strong[2]
What then is the xpath to the text after that closing </strong>
tag that will give me everything up until the next starting <strong>
tag?
Update:
I'm sure the following is the correct xpath to the text after the <strong></strong>
tags, but Selenium does not like it.
When I use this with Selenium with the following xpath, it fails
xpath_wo_num = '/html/body/div[2]/div[4]/div[1]/span/strong[1]/following-sibling::text()[1]'
element = self.driver.find_element_by_xpath(xpath_wo_num)
The developers of Selenium put in specific code that would reject the correct xpath because it returns TEXT.
I get this error message:
Message: invalid selector:
The result of the xpath expression "/html/body/div[2]/div[4]/div[1]/span/strong[1]/following-sibling::text()[1]" is: [object Text].
It should be an element.
(Session info: headless chrome=80.0.3987.132)