Possible way to parse the text alone from an xml document using java dom

Question

I need to receive all the text alone from an xml file for receiving the specific tag i use this code. But i am not sure how to parse all the text from the XML i the XML files are different i don't know their root node and child nodes but i need the text alone from the xml.

try {

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory
                .newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(streamLimiter.getFile());
        doc.getDocumentElement().normalize();

        System.out.println("Root element :"
                + doc.getDocumentElement().getNodeName());
        NodeList nList = doc.getElementsByTagName("employee");
        System.out.println("-----------------------");

        for (int temp = 0; temp < nList.getLength(); temp++) {

            Node nNode = nList.item(temp);
            if (nNode.getNodeType() == Node.ELEMENT_NODE) {

                Element eElement = (Element) nNode;

                NodeList nlList = eElement.getElementsByTagName("firstname")
                        .item(0).getChildNodes();

                Node nValue = (Node) nlList.item(0);

                System.out.println("First Name : "
                        + nValue.getNodeValue());

            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

score 0 · Answer 1 · edited May 23 '17 at 11:48

Quoting jsight's reply in this post: Getting XML Node text value with Java DOM

import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;


class Test {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws Exception {
    String xml = "<add job=\"351\">\n"
        + "    <tag>foobar</tag>\n"
        + "    <tag>foobar2</tag>\n"
        + "</add>";
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
    org.w3c.dom.Document doc = db.parse(bis);
    Node n = doc.getFirstChild();
    NodeList nl = n.getChildNodes();
    Node an, an2;

    for (int i = 0; i < nl.getLength(); i++) {
        an = nl.item(i);
        if (an.getNodeType() == Node.ELEMENT_NODE) {
        NodeList nl2 = an.getChildNodes();

        for (int i2 = 0; i2 < nl2.getLength(); i2++) {
            an2 = nl2.item(i2);
            // DEBUG PRINTS
            System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
            if (an2.hasChildNodes()) {
            System.out.println(an2.getFirstChild().getTextContent());
            }
            if (an2.hasChildNodes()) {
            System.out.println(an2.getFirstChild().getNodeValue());
            }
            System.out.println(an2.getTextContent());
            System.out.println(an2.getNodeValue());
        }

        }
    }
    }
}

Output:

#text: type (3):
foobar
foobar
#text: type (3):
foobar2

Adapt this code to your problem and it should work.

Possible way to parse the text alone from an xml document using java dom

1 Answers1