How to get node value / innerHTML with XPath?

Question

I have a XPath to select to a class I want: //div[@class='myclass']. But it returns me the whole div (with the <div class='myclass'> also, but I would like to return only the contents of this tag without the tag itself. How can I do it?

score 48 · Answer 1 · edited Jun 04 '20 at 23:47

48

node() = innerXml

text() = innerText

both are arrays, so text()[1] is a first children text node...

edited Jun 04 '20 at 23:47

Let Me Tink About It

15,156
21
98
207

answered Jun 05 '12 at 14:08

Nikola Bogdanović

3,206
1
20
28

How would multiple text nodes look like in XML? Would text() return a concatination of all innerTexts that a children of the selected node? – CodeManX Dec 03 '13 at 21:43
2

@CoDEmanX: `
text1text2text3
` as I said, it is an **array**, so `div/node()[0] == div/text()[0] == text1` node, and `div/node()[1] == span` node, and `div/node()[2] == div/text()[1] == text3` node - you would have to concatenate them yourself (by hand or with a helper function that accepts an array). – Nikola Bogdanović Dec 03 '13 at 22:08

score 36 · Accepted Answer · edited Jun 05 '20 at 01:38

With xpath, the thing you will get returned is the last thing in the path that is not a condition. What that means? Well, conditions are the stuff between []'s (but you already knew that) and yours reads like pathElement[that has a 'class' attribute with value 'my class']. The pathElement comes directly before the [.

All the stuff outside of []'s then is the path, so in //a/b/c[@blah='bleh']/d a, b, c and d are all path elements, blah is an attribute and bleh a literal value. If this path matches it will return you a d, the last non-condition thing.

Your particular path returns a (series of) div, being the last thing in your xpath's path. This return value thus includes the top-level node(s), div in your case, and underneath it (them) all its (their) children. Nodes can be elements or text (or comments, processing instructions, ...).

Underneath a node there can be multiple text nodes, hence the array pOcHa talks about. x/text() returns all text that is a direct child of x, x/node() returns all child nodes, including text.

kjhughes · Answer 3 · 2020-01-10T16:40:05.867

_{New answer to an old, frequently asked question:}

For this XML

<div class="myclass">content</div>

you can use XPath to select just content in one of two ways:

Text Node Selection

This XPath,
```
//div[@class='myclass']/text()
```
will select the text node children of the targeted div element, content, as requested.
String Value of an Element

This XPath,
```
string(//div[@class='myclass'])
```
will return string-value of the targeted div element, content, again as requested.

Further information: Here's a note explaining the string-values of elements:

The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order.

King ... You helped me with string(xpath) – chainstair Jun 03 '20 at 00:39 — chainstair, Jun 03 '20 at 00:39

score 4 · Answer 4 · edited Jun 05 '20 at 01:02

4

You can try

//div[@class='myclass']/child::*

child::* selects all element children of the context node see details

edited Jun 05 '20 at 01:02

Let Me Tink About It

15,156
21
98
207

answered Nov 14 '14 at 11:49

sajith

2,564
8
39
57

How to get node value / innerHTML with XPath?

4 Answers4

Linked

Related