1

I'm trying to parse the content.xml inside a ODF-file. I've read the file into a string and i've got a tree object with lxml.etree:

tree = etree.XML(string)

But now I need to find every subelement that is text:a OR text:h. I've been told in previous question that I could use XPath. I've tried but got stuck every single time. Can't even find one of those elements.

If i try:

elem = tree.xpath('//text:p')
I just get a
XPathEvalError: Undefined namespace prefix

So how do I get a list with BOTH of thoose subelements in the right order so i can iterate over them?

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
Niclas Nilsson
  • 5,691
  • 3
  • 30
  • 43

1 Answers1

1

That's because text is a namespace abbreviation, defined in the ODF schema. Try

tree.xpath('//text:a | //text:h',
           namespaces={'text': 'urn:oasis:names:tc:opendocument:xmlns:text:1.0'})

| is the set union operator. See also LXML docs.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Great! Thnx. Now, how do I get an OR statement in there so it fetch both text:p and text:h ? – Niclas Nilsson Sep 14 '11 at 20:56
  • Ah yes, forgot that. Added it to the answer now. – Fred Foo Sep 14 '11 at 21:12
  • I could just have found that by google "xpath operators". So sorry for not trying. But I was really frustrated last night. Thanks alot anyway! :) – Niclas Nilsson Sep 15 '11 at 06:52
  • @larsmans: `|` is the XPath union operator: not an "or" operator. XPath has `or` operator. `|` is a set operator (its arguments must be node-sets) while `or` is a boolean operator -- its arguments must be booleans. As it is possible to convert almost any type to boolean, it just happens so that `or` can be used with almost any types of arguments (the conversions to booleans are done automatically), so it is possible to have the expression `$node-set1 or $node-set2`, however the result is just a boolean - `true()`/`false()`. `|` only operates on nodesets and its result is a nodeset. – Dimitre Novatchev Sep 15 '11 at 12:22
  • @DimitreNovatchev: sorry for the confusion, but I'm used to conflating set union and disjunction (studied too much logic in my life :). Fixed the answer now. – Fred Foo Sep 19 '11 at 21:49