0

I'm trying to develop a Python script in order to extract easily XPath of elements in a XML or HTML file.

For instance, Imagine we have the XML file below (test.xml) for which we would like to get the XPATH of "blue" :

<root>
  <element>
    <name>Element1</name>
    <contains>
      <element>
        <name>color</name>
        <value-ref>/Colors/red</value-ref>
      </element>
    </contains>
  </element>
  <element>
    <name>Colors</name>
    <contains>
      <element>
        <name>red</name>
        <value>0xFF0000</value>
      </element>
      <element>
        <name>blue</name>
        <value>0x0000FF</value>
      </element>
    </contains>
  </element>
</root>

I tried to use LXML, but I'm bit lost :

from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())

How can I get the XPath of the element in tree with text="blue"?

Thank you, Thomas

Thomas
  • 141
  • 1
  • 1
  • 5
  • 3
    Possible duplicate of [How to get path of an element in lxml?](https://stackoverflow.com/questions/1577293/how-to-get-path-of-an-element-in-lxml) – Adam Aug 13 '18 at 13:55
  • You can take a look at the answers to this [question](https://stackoverflow.com/questions/1577293/how-to-get-path-of-an-element-in-lxml). Once you identify the element of interest you can use that question to find a solution. – Adam Aug 13 '18 at 13:55

1 Answers1

0

I'm not so sure this is a duplicate of the question which has been cited. That question, and answers, appear to be traversing the entire tree, visiting each text node, whereas I read this question as simply returning the xpath of a specific node given a criteria - in this case the nodes text() - without having to visit every node.

The first three lines given above are actually correct, you need only add one more to arrive at the simplest answer:

from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())

print(tree.getpath(doc.xpath('//*[contains(text(), "blue")]')[0]))

That gives us the result:

(env) [tlum@localhost python-environments]$ python test.py
/root/element[2]/contains/element[2]/name

Of course, if there is a possibility the criteria won't be found, or be found multiple times, we'd have a little more work to do, but I'll consider that beyond the scope of the question for now.

tlum
  • 913
  • 3
  • 13
  • 30