0

When we parse an XML document like

<entry>
  Sometext
</entry>

using Javax.xml.Parsers.DocumentBuilder, we can always get the text node by

Document doc = ...
Node entry = doc.getFirstChild();
Node textNode = entry.item(0);

However, I wonder how can we get text node by using dom4j. It seems dom4j doesn't treat text as a text node.

Dickens LI
  • 11
  • 2

2 Answers2

0

see dom4j.github.io ...it might be something alike:

List<Node> list = document.selectNodes("//entry");
for (Iterator<Node> iter = list.iterator(); iter.hasNext();) {
    Attribute attribute = (Attribute) iter.next();
    String text = attribute.getValue();
}

alternative, to only select a single node in DOM:

Node node = document.selectSingleNode("//entry");
Martin Zeitler
  • 1
  • 19
  • 155
  • 216
  • You are right but what I want here is a `Node` type text rather than a `String`. For example, when I need to parse XQuery like `//entry/text()`, I'd like to get back a list of `Node` rather than a list of `String`. Thanks! – Dickens LI Mar 04 '18 at 08:11
  • the first line actually obtains such a list of nodes... selected by xPath. you might only need to adjust the selector; one can copy from / validate with the browser's common F12 tools – Martin Zeitler Mar 04 '18 at 08:26
0

Given this:

    String xml = "<root><entry>one</entry><entry>two</entry></root>";
    Document doc = DocumentHelper.parseText(xml);
    doc.selectNodes("//entry")
        .forEach(n -> System.out.printf("%s -> %s\n", n.getClass().getSimpleName(), n.getStringValue()));
    doc.selectNodes("//entry/text()")
        .forEach(n -> System.out.printf("%s -> %s\n", n.getClass().getSimpleName(), n.getStringValue()));

The first selectNodes call prints

DefaultElement -> one
DefaultElement -> two

while the other prints

DefaultText -> one
DefaultText -> two
forty-two
  • 12,204
  • 2
  • 26
  • 36