<td>
<span>hi</span>
<a>re</a>
hello
</td>
Asked
Active
Viewed 676 times
0

roeygol
- 4,908
- 9
- 51
- 88

user3247895
- 473
- 1
- 6
- 14
1 Answers
1
Looking at the documentation, getTextContent
clearly says it returns the text of the element and its descendants, and I don't see any other method to return just the sum of the text nodes, so I think you need a loop. E.g., assuming element
refers to the td
element:
StringBuffer sb = new StringBuffer(/*some appropriate size*/);
for (DomNode n : element.getChildNodes()) {
if (n.getNodeType() == Node.TEXT_NODE) {
sb.append(n.getTextContent());
}
}
String text = sb.toString();
Note that the sum of the text nodes in the structure you've quoted isn't just "hello"
, it'll have whitespace both before and after that. If you just want "hello"
, you'll need to trim that off.

T.J. Crowder
- 1,031,962
- 187
- 1,923
- 1,875
-
1I guess `element.normalize()`, followed by `element.getLastChild().asText()` should do the trick, too. But I haven't tested it to make sure. – JB Nizet Nov 27 '16 at 10:19
-
@JBNizet: Probably, for that *specific* structure, since in this specific case it's just the one text node they're interested in. – T.J. Crowder Nov 27 '16 at 10:20