9

Is there a way to match multiple elements in a tree using .findall()?

I would like to do this:

trees = log.findall('element1' or 'element2')

This is my work around (which works in my case because I don't have both e1 and e2 in the same XML):

trees = log.findall('element1')
if not trees:
    trees = log.findall('element2')

I am parsing XML files that have similar structures but different names. C# allows "element1 | element2" matching.

solbs
  • 940
  • 3
  • 15
  • 29
  • Given your workaround, what if there are elements in both element1 and element2? – Alex Reynolds Jul 21 '14 at 17:41
  • Good point. I edited to question to be more clear. I'm boiler plating code that works either with 1 or 2, never both. – solbs Jul 21 '14 at 17:49
  • 1
    @user3769076: Can you require `lxml` and use `lxml.etree` in place of the stdlib `xml.etree`? It often works as a drop-in replacement, and it offers a better answer here. – abarnert Jul 21 '14 at 17:50

1 Answers1

13

No, you can't. C# appears to be using XPath expressions, but the ElementTree XPath support for XPath queries is too limited and does not include the support for this.

You can use or to pick your second search if the first is empty:

trees = log.findall('element1') or log.findall('element2')

because an empty result is false-y.

The alternative is to use lxml, an ElementTree API implementation on top of libxml2, which supports all of the XPath 1.0 spec. Then you can do:

log.xpath('(.//element1|.//element2)')
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Does the `lxml` implementation of the etree API support `|` in XPath? If so, that may be an acceptable alternative for the OP. – abarnert Jul 21 '14 at 17:48
  • @abarnert: `lxml` supports all of XPath 1.0. – Martijn Pieters Jul 21 '14 at 17:48
  • Yeah, I'm just not sure whether it has full XPath in its etree `findall` or not (and for some reason it's not building on the machine I'm sitting at, so I can't test…). – abarnert Jul 21 '14 at 17:49
  • 1
    @abarnert: no, `.xpath()` is the method to use here; `.findall()` was kept bug-compatible with the original API implementation. – Martijn Pieters Jul 21 '14 at 17:50
  • Thank you, I couldn't find out from the webpage or any other explanation that you could use (a|b) with xpath. The documentation is horrible although lxml is one of the best tools. – CodeMonkey Apr 22 '16 at 12:21
  • FWIW, I had to use the XPath syntax `.xpath("//element1|//element2')`. I couldn't get `lxml` to accept the `//(element1|element2)` pattern,. – Tom Johnson Jul 02 '22 at 13:28
  • @TomJohnson: ah, yes, that's my mistake. XPath 1.0 doesn't support `|` unions between relative location paths (the `element1` and `element2` strings), only between path expressions (which includes the `//` prefix). I'll correct my answer. – Martijn Pieters Jul 07 '22 at 12:28