How can we get text node using dom4j

Question

When we parse an XML document like

<entry>
  Sometext
</entry>

using Javax.xml.Parsers.DocumentBuilder, we can always get the text node by

Document doc = ...
Node entry = doc.getFirstChild();
Node textNode = entry.item(0);

However, I wonder how can we get text node by using dom4j. It seems dom4j doesn't treat text as a text node.

score 0 · Answer 1 · answered Mar 04 '18 at 07:59

0

see dom4j.github.io ...it might be something alike:

List<Node> list = document.selectNodes("//entry");
for (Iterator<Node> iter = list.iterator(); iter.hasNext();) {
    Attribute attribute = (Attribute) iter.next();
    String text = attribute.getValue();
}

alternative, to only select a single node in DOM:

Node node = document.selectSingleNode("//entry");

answered Mar 04 '18 at 07:59

Martin Zeitler

1
19
155
216

You are right but what I want here is a `Node` type text rather than a `String`. For example, when I need to parse XQuery like `//entry/text()`, I'd like to get back a list of `Node` rather than a list of `String`. Thanks! – Dickens LI Mar 04 '18 at 08:11
the first line actually obtains such a list of nodes... selected by xPath. you might only need to adjust the selector; one can copy from / validate with the browser's common F12 tools – Martin Zeitler Mar 04 '18 at 08:26

score 0 · Answer 2 · answered Mar 05 '18 at 13:07

Given this:

    String xml = "<root><entry>one</entry><entry>two</entry></root>";
    Document doc = DocumentHelper.parseText(xml);
    doc.selectNodes("//entry")
        .forEach(n -> System.out.printf("%s -> %s\n", n.getClass().getSimpleName(), n.getStringValue()));
    doc.selectNodes("//entry/text()")
        .forEach(n -> System.out.printf("%s -> %s\n", n.getClass().getSimpleName(), n.getStringValue()));

The first selectNodes call prints

DefaultElement -> one
DefaultElement -> two

while the other prints

DefaultText -> one
DefaultText -> two

How can we get text node using dom4j

2 Answers2