After upgrading from Java1.2 and Apache Xerces DomParser to Java1.7 and Xerces JAXP DocumentBuilder, the upgraded parsing completes without errors but does not “unwrap” CDATA elements, despite initializing the DocumentBuilderFactory with “setCoalescing(true);”
That is, input XML elements such as <ITEMDESC><![CDATA[ Sales Bom Material,Dist]]></ITEMDESC>
are returned unmodified.
The code is shown below.
I’m new to XML parsing, so it’s likely that I’m missing something quite basic.
Our input XML has literally hundreds of different tags, so we’d like a solution that works without changing each element “get”.
Are there other requirements/hints/tips/tricks for getting “setCoalescing(true);” to work ?
Thanks in advance for any suggestions.
Code:
DocumentBuilderFactory aDocBuilderFactory = DocumentBuilderFactory.newInstance();
aDocBuilderFactory.setValidating(m_dtdValidate);
// Set to make sure that CDATA elements are automatically converted and collected into a single text element
aDocBuilderFactory.setCoalescing(true);
// Make sure that entity references are expanded, this includes the replacements for the reserved markup
// characters
aDocBuilderFactory.setExpandEntityReferences(true);
// Ignore comments as they won't contain information to be processed
aDocBuilderFactory.setIgnoringComments(true);
// Get a document builder
m_documentBuilder = aDocBuilderFactory.newDocumentBuilder();
// Install entity resolver if required
m_documentBuilder.setEntityResolver(new DocumentEntityResolver());
m_document = m_documentBuilder.parse(pSource);