1

I need to convert CSV into XML and then to OutputStream. Rule is to convert " into " in my code.

Input CSV row:

{"Test":"Value"}

Expected output:

<root>
<child>{&quot;Test&quot;:&quot;Value&quot;}</child>
<root>

Current output:

<root>
<child>{&amp;quot;Test&amp;quot;:&amp;quot;Value&amp;quot;}</child>
<root>

Code:

File file = new File(FilePath);
BufferedReader reader = null;

DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder = domFactory.newDocumentBuilder();

Document newDoc = domBuilder.newDocument();
Element rootElement = newDoc.createElement("root");
newDoc.appendChild(rootElement);

reader = new BufferedReader(new FileReader(file));
String text = null;

    while ((text = reader.readLine()) != null) {
            Element rowElement = newDoc.createElement("child");
            rootElement.appendChild(rowElement);
            text = StringEscapeUtils.escapeXml(text);
            rowElement.setTextContent(text);
            }

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Source xmlSource = new DOMSource(newDoc);
Result outputTarget = new StreamResult(outputStream);
TransformerFactory.newInstance().newTransformer().transform(xmlSource, outputTarget);
System.out.println(new String(baos.toByteArray()))

Could you please help? What I miss and when & convert to &amp;?

  • You are double-escaping. DOM will escape for you, but you escape too. Remove call to `StringEscapeUtils.escapeXml(text)`. – Andreas Sep 13 '16 at 00:55
  • I've read about this. Strange is That after removing escaping, there are no escape happen at all. – user3305630 Sep 13 '16 at 07:10
  • Because you only need to escape `"` in attributes with values quoted by `"`, e.g. this is valid XML: `he'd said: "Hello"`. The characters `<` and `&` must always be quoted (except in CDATA), while `>` only needs quoting when following `]]` (as in the CDATA terminator `]]>`), but `>` is usually always quoted too. – Andreas Sep 13 '16 at 16:14

1 Answers1

1

The XML library will automatically escape strings that need to be XML-escaped, so you don't need to manually escape using StringEscapeUtils.escapeXml. Simply remove that line and you should get exactly what you're looking for properly-escaped XML.

XML doesn't require " characters to be escaped everywhere, only within attribute values. So this is valid XML already:

<root>
<child>{"Test":"Value"}</child>
<root>

You would escape the quotes if you had an attribute that contained a quote, such as: <child attr="properly &quot;ed"/>

This is one of the main reasons to use an XML library: the subtleties of quoting are already handled for you. No need to read the XML spec to make sure you got the quoting rules correct.

Jason Hoetger
  • 7,543
  • 2
  • 16
  • 18