12

I am trying to find a way to search for a string within nodes, but excluding ythe content of some subelements of those nodes. Plain and simple, I want to search for a string in paragraphs of a text, excluding the footnotes which are children elements of the paragraphs.

For example,

My document being:

<document>
   <p n="1">My text starts here/</p>
   <p n="2">Then it goes on there<footnote>It's not a very long text!</footnote></p>
</document>

When I'm searching for "text", I would like the Xpath / XQuery to retrieve the first p element, but not the second one (where "text" is contained only in the footnote subelement).

I have tried the contains() function, but it retrieves both p elements.

Any help would be much appreciated :)

Sofia
  • 771
  • 1
  • 8
  • 22
Hemka
  • 123
  • 1
  • 1
  • 4
  • Good question, +1. See my answer for a short and easy XPath 1.0 expression that selects the wanted text-nodes even in much more complex XML documents. :) – Dimitre Novatchev Jan 19 '11 at 14:15

4 Answers4

14

I want to search for a string in paragraphs of a text, excluding the footnotes which are children elements of the paragraphs

An XPath 1.0 - only solution:

Use:

//p//text()[not(ancestor::footnote) and contains(.,'text')]

Against the following XML document (obtained from yours but added p s within a footnote to make this more interesting):

<document>
    <p n="1">My text starts here/</p>
    <p n="2">Then it goes on there
        <footnote>It's not a very long text!
           <p>text</p>
        </footnote>
    </p>
</document>

this XPath expression selects exactly the wanted text node:

My text starts here/
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
4
//p[(.//text() except .//footnote//text())[contains(., 'text')]]
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
1

/document/p[text()[contains(., 'text')]] should do.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Thanks Martin! The only problem with this one, is that is selects 'text' in p, ignoring the content of *all* subelements. I only want to ignore the footnote elements. – Hemka Jan 19 '11 at 13:18
  • Can you update your question with some more representative XML sample so that it becomes clearer what the requirements are? Does `/document/p[descendant-or-self::*[not(self::footnote)]/text()[contains(., 'text')]]` suffice? – Martin Honnen Jan 19 '11 at 13:38
0

For the record, as a complement to the other answers, I've found this workaround that also seems to do the job:

//p[contains(child::text()|not(descendant::footnote), "text")]
Hemka
  • 123
  • 1
  • 1
  • 4
  • 1
    This isn't a valid XPath expression. The union operator (`|`) requires both of its operands to be nodes, but the return type of the `not()` function is xs:boolean -- any compliant XPath engine *must* raise an error. – Dimitre Novatchev Jan 19 '11 at 17:18
  • Ouch, you're right Dimitre, Oxygen raised an error. Weird, the expression worked in my PHP script! – Hemka Jan 22 '11 at 19:06