This is the HTML
sample:
<div class="wpb_text_column">
<div class="wpb_wrapper">
<p style="text-align: center;"><a href="http://somepage1.com">First text part </a></p>
<p style="text-align: center;"><a href="http://somepage2.com">Second text part </a></p>
<p style="text-align: center;"><a href="http://somepage3.com">Third text part</a></p>
</div>
</div>
<div class="wpb_text_column">
<div class="wpb_wrapper">
<p style="text-align: center;"><a href="http://somepage4.com">First text part </a></p>
<p style="text-align: center;"><a href="http://somepage5.com">Second text part</a></p>
</div>
</div>
With below code
tree = html.fromstring(html_sample)
tree.xpath('//div[@class="wpb_text_column"]/div[@class="wpb_wrapper"]/p/a/text()')
I can get list of text values
['First text part ', 'Second text part ', 'Third text part', 'First text part ', 'Second text part']
However, I want to get all values from each div
as single string like
['First text part Second text part Third text part', 'First text part Second text part']
and
//div[@class="wpb_text_column"]/div[@class="wpb_wrapper"]/normalize-space()
seem to be exact XPath
to solve the problem, but lxml
doesn't support /normalize-space()
syntax:
lxml.etree.XPathEvalError: Invalid expression
So how to get desired output in lxml
?