I am using XPath extensively in the past. Currently I am facing a problem, which I am unable so solve.
Constraints
- pure XPath 1.0
- no aux-functions (e.g. no "concat()")
HTML-Markup
<span class="container">
Peter: Lorem Impsum
<i class="divider" role="img" aria-label="|"></i>
Paul Smith: Foo Bar BAZ
<i class="divider" role="img" aria-label="|"></i>
Mary: One Two Three
</span>
Challenge
I want to extract the three coherent strings:
- Peter: Lorem Impsum
- Paul Smith: Foo Bar BAZ
- Mary: One Two Three
XPath
The following XPath-queries is the best I've come up with after HOURS of research:
XPath-query 1
//span[contains(@class, "container")]
=> Peter: Lorem ImpsumPaul Smith: Foo Bar BAZMary: One Two Three
XPath-query 2
//span[contains(@class, "container")]//text()
Peter: Lorem Impsum Paul Smith: Foo Bar BAZ Mary: One Two Three
Problem
Although it is possible to post-process the resulting string using (PHP) string functions afterwards, I am not able to split it into the correct three chunks: I need an XPath-query which enables me to distinguish the text-nodes correctly.
Is it possible to integrate some "artificial separators" between the text-nodes?