0

Hey, i'm trying to find out the parent-node of the nodes content text.

example:

<div>
    <h1>Node to find</h1>
    <p>another node</p>
</div>

All my code know is what the text in the node is and my script needs to find out in what node the text contains.

i have tried the following xpaths:

 1. //*[. = "'. $text .'"]
 2. //*[contains(., "'. $text .'")]

the first gives me a empty nodeList the second gives me a lot of nodes, but it gives me all the parent containing the text, i want only the first parent.

Thanks for any help.

Elzo Valugi
  • 27,240
  • 15
  • 95
  • 114
Henriksjodahl
  • 145
  • 1
  • 14
  • 1
    possible duplicate of [PHP/XPath: find text node that "starts with" a particular string?](http://stackoverflow.com/questions/4822469/php-xpath-find-text-node-that-starts-with-a-particular-string) – Gordon Feb 01 '11 at 10:53
  • `//*[contains(., "'. $text .'")][1]` or, depending on what you need, `//*[contains(text(), "'. $text .'")]` – biziclop Feb 01 '11 at 11:11
  • 1
    got what i need with the following: `//*[starts-with(., "'. $text .'")]` – Henriksjodahl Feb 01 '11 at 11:23
  • Check my answer for two correct XPath expressions –  Feb 01 '11 at 16:25

2 Answers2

3

I'm not sure I understand the "'. $text .'" part of your answer... I guess that means some sample text, not an intended reference to a variable named text?

Anyway, when you use contains(., "foo") you are asking whether the current node's string value contains "foo". The current node's string value is the concatenation of all descendant text nodes' string values. That is why //*[contains(., "foo")] returns a list of nodes: it matches every ancestor of every text node containing "foo". (And it can be very inefficient because you're doing that concatenation function on every node in the tree.)

The reason your starts-with() answer worked (sometimes) is that you got lucky: the parent node of the text node had other preceding siblings with their own text, so the grandparent node's text value started with something else. Also very inefficient...

If the text you're looking for will only be in one text node -- i.e. it will not be split up across multiple elements / comments / etc. -- then you can efficiently and accurately match only the element containing the text node, using [edited]:

//*[text()[contains(., "foo")]]

(similar to what @biziclop said).

If the text you're looking might be split up across multiple elements / comments / etc. -- then you can use this [edited, twice]:

//*[contains(., "foo") and not(*[contains(., "foo")])]

But that's fairly inefficient. The following is not guaranteed to work:

//*[contains(., "foo")][1]

It will give you [edited, twice] every element that is a first child of its parent that (is an ancestor of one that) contains the text. (Or an empty nodeset, if "foo" is not found.) I'm trusting @Alejandro on this one... I still have not internalized how to tell when [position() = x] applies to the most recent location step only. Regardless, this XPath expression is not guaranteed to give you the right result.

LarsH
  • 27,481
  • 8
  • 94
  • 152
  • Thanks for that answer Lars. You really explained it all. Seems like i've got some alternatives on how i want it to work. Saving this one for future references. Thanks again. – Henriksjodahl Feb 01 '11 at 12:32
  • @LarsH: First one meaning: *every element such that **first** text node child contains "foo"*. Second one: If I'm containing "foo", my parent always contains "foo". Last one meaning: *every first child containing "foo"* –  Feb 01 '11 at 16:29
  • @Alej: ah, good point. I'd forgotten that, about contains() taking the first node in its first argument nodeset... May have to edit. In fact, were all three of my sample XPaths wrong?? :-p – LarsH Feb 02 '11 at 13:33
  • @Alej and @Henrik: updated my answer so it's hopefully correct now. – LarsH Feb 02 '11 at 14:02
  • @LarsH: Now, first one is fine. Second one has the implicit casting problem: take the "any child element" node set out of the function call. About third: I rephrase the meaning: *from those children containing "foo", the first one* –  Feb 02 '11 at 15:40
  • @Alej: bah... you're right about the 2nd one. My brain is not working well. Re: 3rd: ok. Will edit. (We have rolling blackouts here so my connection is on and off.) – LarsH Feb 02 '11 at 17:41
1

i'm trying to find out the parent-node of the nodes content text.
[...] but it gives me all the parent containing the text, i want only the first parent.

The classic answer would be:

//*[text()[contains(.,$pText)]]

Meaning: any element having at least one text node child containing $pText variable/param reference string value as part of its string value

It was metioned the posible mixed content model. I doubt this is a real consideration, but any way, here is the answer:

//*[contains(.,$pText)][not(*[contains(.,$pText)])]

Meaning: any element containing $pText as part of its string value, not having any child element with $pText as part of its string value. In other words, innermost element containing $pText string value.