While parsing the XML with SAX parser in JAVA, I am not able to get data as it is in XML. The problem is if the node contains text data with some unicode charaters.
The node.getTextContent()
is splitting the content at unicode characters and trimming the whitespace between two unicode characters.
Suppose, if the node is having the data oro-maxilo-facială și implantologie
.
Please observe the space between ă și
.
The method node.getTextContent()
returns the string as oro-maxilo-facialăși implantologie
(no whitespace).
Below is the code I tried.
private String getNodeContent(Element nodeToSerialize) {
StringBuffer sb = new StringBuffer();
if (nodeToSerialize.hasChildNodes()) {
NodeList nodeList = nodeToSerialize.getChildNodes();
for (int x = 0; x < nodeList.getLength(); x++) {
Node node = nodeList.item(x);
sb.append(node.getTextContent());
}
}
return sb.toString();
}
XML content is
<record>
<isbn>1234-5689</isbn>
<titles>
<title>Revista de chirurgie oro-maxilo-facială și implantologie</title>
</titles>
<number>16</number>
</record>