I want to extract elements from the HTML page, containing text, ignoring markup. For example, I want to extract node containing the text "Run, Sarah, run!" from https://en.wiktionary.org/wiki/run. I know about node test text()
and function string()
. I tried them both:
As you see, if I use string()
it returns too many nodes (result includes the nodes that include the node I need) and if I use text()
it returns nothing (because of the <b>
tag).
How do I find required nodes?
UPD: I want all deepest nodes. That means if the Wikitionary page contained this sentence twice, I wanted to select two nodes.
Also, I don't know the node type.