Is there a way to show: (a) the full path to a located node? (b) show the attributes of the path nodes even if I don't know what those attributes might be called?
For example, given a page:
<!DOCTYPE html>
<HTML lang="en">
<HEAD>
<META name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.2.0">
<META charset="utf-8">
<TITLE>blah ombid lipsum</TITLE>
</HEAD>
<BODY>
<P>I'm the expected content</P>
<DIV unexpectedattribute="very unexpected">
<P>I'm wanted but not where you thought I'd be</P>
<P class="strangeParagraphType">I'm also wanted text but also mislocated</P>
</DIV>
</BODY>
</HTML>
I can find wanted text with
# Import Python libraries
import sys
from lxml import html
page = open( 'findme.html' ).read()
tree = html.fromstring(page)
wantedText = tree.xpath(
'//*[contains(text(),"wanted text")]' )
print( len( wantedText ), ' item(s) of wanted text found')
Having found it, however, I'd like to be able to print out the fact that the wanted text is located at:
/HTML/BODY/DIV/P
... or, even better, to show that it is located at /HTML/BODY/DIV/P[2]
... and much better, to show that it is located at that location with /DIV
having unexpectedattribute="very unexpected"
and the final <P>
having the class of strangeParagraphType
.