2

I have the following xml :

<test1>
    <test2>
       <text>This is a question on xpath
       </text>
    </test2>
    <test3>
        <test2>
            <text>Do not extract this
             </text>
        </test2>
    </test3>
</test1> 

I need to extract text within test2/text but not if test2 comes inside test3. How can this be done in xpath ? I tried with findall with something like:

for p in lxml_tree.xpath('.//test2',namespaces={'w':w}):
    for q in p.iterancestors():
        if q.tag=="test3":
           break
        else:
            text+= ''.join(t.text for t in p.xpath('.//text'))

but this doesn't work . I guess xpath has a better way in a single expression to exclude it.

Expected output:

text = "This is a question on xpath"
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Hypothetical Ninja
  • 3,920
  • 13
  • 49
  • 75

1 Answers1

3

Assuming by comes inside you mean any level of parent, you can use not with the ancestor axis to check to see whether a node does not have a specific parent / ancestor:

//test2[not(ancestor::test3)]/text

If however you meant immediate parent should not be test3, then switch ancestor for parent:

//test2[not(parent::test3)]/text
StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • I'm no pythonista, but the result is a `nodeset`, and lxml seems a robust library, so I would imagine this can be used as `for p in lxml_tree.xpath('.//test2[not(ancestor::test3)]/text')` – StuartLC Dec 13 '14 at 09:34