3

Running Python 3.7.4 on Windows, I notice that XPath evaluation differs from the results of online evaluators, such as here or here.

Online evaluators allow entering a relative expression, which will be evaluated on the entire document. However, with lxml I get no matches on the element tree unless I make it an absolute expression by prepending a slash.

Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
>>> import lxml.etree
>>> root = lxml.etree.fromstring('''
... <TestRootNode>
...   <person personID="person1">
...     <name>James</name>
...   </person>
...   <person personID="person2">
...     <name>Cathy</name>
...   </person>
... </TestRootNode>''')
>>> tree = root.getroottree()
>>> tree.xpath('/TestRootNode/person')
[<Element person at 0x2ceee1f4e88>, <Element person at 0x2ceee1ff048>]
>>> tree.xpath('string(/TestRootNode/person[1])')
'\n    James\n  '
>>> tree.xpath('TestRootNode/person')
[]
>>> tree.xpath('string(TestRootNode/person[1])')
''

I hvae two questions:

  1. Who is right, the online evaluators or lxml? Is it allowed to apply a relative expression in the context of the whole document?

  2. In case the online evaluators are right: Is there a simple way to make lxml behave in the same way? Simply putting a slash at the beginning of the string won't work, as you can see from my example with the string() function.

saeed foroughi
  • 1,662
  • 1
  • 13
  • 25
  • Which expression are you using with the online evaluators and what are the different outputs? – Jack Fleeting Jan 24 '20 at 14:43
  • 1
    If I remember correctly `tree` already has the context of `TestRootNode`. For an xpath relative to the current context, try `tree.xpath('person')` (or `tree.xpath('./person')`). – Daniel Haley Jan 24 '20 at 14:55
  • @JackFleeting The same expressions as in the Python statements above. In http://xpather.com both "/TestRootNode/person" and "TestRootNode/person" find 2 elements. And both string(/TestRootNode/person[1]) and string(TestRootNode/person[1]) find the string "James". – Daniel Herding Jan 24 '20 at 22:02
  • Clarification to question 2: The XPath expression strings are passed in by the user. My program could preprocess the strings before passing them to the lxml xpath() function (e.g. convert the relative expression to an absolute expression), but this preprocessing would have to be fully automated (e.g. the program would have to automatically detect the location where to insert a slash). – Daniel Herding Jan 24 '20 at 22:13

1 Answers1

4

Who is right, the online evaluators or lxml?

For relative expressions, it is not entirely clear in which context they should be evaluated. Different tools have different assumptions.

The online tools you tested probably evaluate relative expressions in the context of the document node (which represents the entire document). The document node has as its only child element the outermost element of the document.

lxml claims to follow the same convention:

For ElementTree, the xpath method performs a global XPath query against the document (if absolute) or against the root node (if relative)

Which is not entirely true, since root node is a special kind of node different from an element node, while lxml actually evaluates against the root element (as Daniel Haley has pointed out as well).

>>> root = lxml.etree.fromstring('<root><child/></root>')
>>> root.xpath("child")
[<Element child at 0x10c093fc8>]
Mathias Müller
  • 22,203
  • 13
  • 58
  • 75