9

How to parse a XML having data included in <![CDATA[---]... how can we parse the xml and get the data included in CDATA ???

Mohammad Faisal
  • 5,783
  • 15
  • 70
  • 117
GOK
  • 2,338
  • 6
  • 34
  • 63

5 Answers5

9
public static void main(String[] args) throws Exception {
  File file = new File("data.xml");
  DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
 //if you are using this code for blackberry xml parsing
  builder.setCoalescing(true);
  Document doc = builder.parse(file);

  NodeList nodes = doc.getElementsByTagName("topic");
  for (int i = 0; i < nodes.getLength(); i++) {
    Element element = (Element) nodes.item(i);
    NodeList title = element.getElementsByTagName("title");
    Element line = (Element) title.item(0);
    System.out.println("Title: " + getCharacterDataFromElement(line));
  }
}
public static String getCharacterDataFromElement(Element e) {
  Node child = e.getFirstChild();
  if (child instanceof CharacterData) {
    CharacterData cd = (CharacterData) child;
    return cd.getData();
  }
  return "";
}

( http://www.java2s.com/Code/Java/XML/GetcharacterdataCDATAfromxmldocument.htm )

BSKANIA
  • 1,317
  • 15
  • 27
Thargor
  • 1,862
  • 14
  • 24
  • I would rather do something like : if (child != null && (child instanceof CharacterData)) { return ((CharacterData) child).getData(); } else { return e.getNodeValue(); } In order to handle seamlessly the presence/absence of CDATA block. – Raphael Jolivet Jun 27 '12 at 10:12
  • Can you please provide some text to describe what you are doing and why you would use the `DocumentBuilderFactory`? – Gray Nov 01 '15 at 23:50
  • In current Java DOM implementation you can access CDATA simply as text data using `e.getTextContent()`. [See example](http://stackoverflow.com/questions/42802202) without type check, cast, `e.getData()`. – jschnasse Mar 28 '17 at 14:16
4

Since all previous answers are using a DOM based approach. This is how to parse CDATA with a stream based approach using STAX.

Use the following pattern:

  switch (EventType) {
        case XMLStreamConstants.CHARACTERS:
        case XMLStreamConstants.CDATA:
            System.out.println(r.getText());
            break;
        default:
            break;
        }

Complete sample:

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public void readCDATAFromXMLUsingStax() {
    String yourSampleFile = "/path/toYour/sample/file.xml";
    XMLStreamReader r = null;
    try (InputStream in =
            new BufferedInputStream(new FileInputStream(yourSampleFile));) {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        r = factory.createXMLStreamReader(in);
        while (r.hasNext()) {
            switch (r.getEventType()) {
            case XMLStreamConstants.CHARACTERS:
            case XMLStreamConstants.CDATA:
                System.out.println(r.getText());
                break;
            default:
                break;
            }
            r.next();
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    } finally {
        if (r != null) {
            try {
                r.close();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }
}

With /path/toYour/sample/file.xml

 <data>
    <![CDATA[ Sat Nov 19 18:50:15 2016 (1672822)]]>
    <![CDATA[Sat, 19 Nov 2016 18:50:14 -0800 (PST)]]>
 </data>

Gives:

 Sat Nov 19 18:50:15 2016 (1672822)                             
 Sat, 19 Nov 2016 18:50:14 -0800 (PST)       
jschnasse
  • 8,526
  • 6
  • 32
  • 72
2

CDATA just says that the included data should not be escaped. So, just take the tag text. XML parser should return the clear data without CDATA.

AlexR
  • 114,158
  • 16
  • 130
  • 208
0

here r.get().getResponseBody() is the response body

Document doc = getDomElement(r.get().getResponseBody());            
    NodeList nodes = doc.getElementsByTagName("Title");
    for (int i = 0; i < nodes.getLength(); i++) {
    Element element = (Element) nodes.item(i);
    NodeList title = element.getElementsByTagName("Child tag where cdata present");
    Element line = (Element) title.item(0);
    System.out.println("Title: "+ getCharacterDataFromElement(line));


    public static Document getDomElement(String xml) {
        Document doc = null;
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setCoalescing(true);
        dbf.setNamespaceAware(true);
        try {
            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource();
            is.setCharacterStream(new StringReader(xml));
            doc = db.parse(is);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return doc;
    }

    public static String getCharacterDataFromElement(Element e) {
        Node child = e.getFirstChild();
        if (child instanceof CharacterData) {
            CharacterData cd = (CharacterData) child;
            return cd.getData();
        }
        return "";
    }
Mohammad Faisal
  • 5,783
  • 15
  • 70
  • 117
fresher
  • 127
  • 11
0

Below is the sample XML file and the code to retrieve the XML embedded in the the CDATA within main xml.

<envelope>
 <Header>
  <id>123</id>
  <name>abc</name>
 </Header>
 <payload>
  <![CDATA[<?xml> <Document><validXML></validXML></Document>]]>
</payload>
</envelope>

Xpath to get the CDATA XML given in above example would be

/envelope/payload/text()

So, once you have the root Document of above xml, with the given Path you can fetch the xml embedded in the CDATA.

Below is the utility method for the same.

public String getSubDocument(Document rootDocument, String xPathString) throws Exception {
XPath xPath = XPathFactory.newInstance().newXPath();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document rootDoc = builder.newDocument();
String xmlString = (String)xPath.compile(xPathString).evaluate(rootDocument, XPathConstants.String);
return xmlString;
}

}

Sanjay Bharwani
  • 3,317
  • 34
  • 31