32

I have an XML file stored as a DOM Document and I would like to pretty print it to the console, preferably without using an external library. I am aware that this question has been asked multiple times on this site, however none of the previous answers have worked for me. I am using java 8, so perhaps this is where my code differs from previous questions? I have also tried to set the transformer manually using code found from the web, however this just caused a not found error.

Here is my code which currently just outputs each xml element on a new line to the left of the console.

import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


public class Test {
    public Test(){
        try {
            //java.lang.System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl");

            DocumentBuilderFactory dbFactory;
            DocumentBuilder dBuilder;
            Document original = null;
            try {
                dbFactory = DocumentBuilderFactory.newInstance();
                dBuilder = dbFactory.newDocumentBuilder();
                original = dBuilder.parse(new InputSource(new InputStreamReader(new FileInputStream("xml Store - Copy.xml"))));
            } catch (SAXException | IOException | ParserConfigurationException e) {
                e.printStackTrace();
            }
            StringWriter stringWriter = new StringWriter();
            StreamResult xmlOutput = new StreamResult(stringWriter);
            TransformerFactory tf = TransformerFactory.newInstance();
            //tf.setAttribute("indent-number", 2);
            Transformer transformer = tf.newTransformer();
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.transform(new DOMSource(original), xmlOutput);
            java.lang.System.out.println(xmlOutput.getWriter().toString());
        } catch (Exception ex) {
            throw new RuntimeException("Error converting to String", ex);
        }
    }

    public static void main(String[] args){
        new Test();
    }

}
loopbackbee
  • 21,962
  • 10
  • 62
  • 97
Hungry
  • 1,645
  • 1
  • 16
  • 26

7 Answers7

56

In reply to Espinosa's comment, here is a solution when "the original xml is not already (partially) indented or contain new lines".

Background

Excerpt from the article (see References below) inspiring this solution:

Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath’s normalize-space to locate all the whitespace nodes and remove them first.

Java Code

public static String toPrettyString(String xml, int indent) {
    try {
        // Turn xml string into a document
        Document document = DocumentBuilderFactory.newInstance()
                .newDocumentBuilder()
                .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

        // Remove whitespaces outside tags
        document.normalize();
        XPath xPath = XPathFactory.newInstance().newXPath();
        NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
                                                      document,
                                                      XPathConstants.NODESET);

        for (int i = 0; i < nodeList.getLength(); ++i) {
            Node node = nodeList.item(i);
            node.getParentNode().removeChild(node);
        }

        // Setup pretty print options
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        transformerFactory.setAttribute("indent-number", indent);
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        // Return pretty print xml string
        StringWriter stringWriter = new StringWriter();
        transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
        return stringWriter.toString();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Sample usage

String xml = "<root>" + //
             "\n   "  + //
             "\n<name>Coco Puff</name>" + //
             "\n        <total>10</total>    </root>";

System.out.println(toPrettyString(xml, 4));

Output

<root>
    <name>Coco Puff</name>
    <total>10</total>
</root>

References

Community
  • 1
  • 1
Stephan
  • 41,764
  • 65
  • 238
  • 329
  • This is actually pretty similar to the code which I ended up using :). – Hungry Nov 05 '15 at 10:55
  • 1
    @btrs20 The difference relies in the whitespaces removal. – Stephan Nov 05 '15 at 11:36
  • 1
    I ended up doing something similar, simple recursion looking for white space only text nodes, no xpath. Yours code shorter. Nice example of advanced XPath. Thanks. – Espinosa Nov 07 '15 at 19:51
  • 1
    If this works perfect. But if you have some exceptions regarding the lack of `indend-number` attribute the solution will be to check the classpath for classes implementing TransformerFactory. I had in classpath the library `net.sf.saxon:Saxon-HE` that defined an additional TransformerFactory. – raisercostin Apr 17 '16 at 22:04
  • 1
    Removal of the whitespace is important. The transformer doesn't work if your String has whitespace between lines. – Display name Oct 28 '16 at 19:45
  • 1
    Note that this does not play well with XHTML DOCTYPE declarations (it tries to fetch them); once removed, this solution works very well. Also note that because of other imports I had to use org.w3c.dom.Document and org.w3c.dom.Node explicitly instead of Document and Node, and instead of the ByteArrayInputStream you can use InputSource inputSource = new InputSource(new StringReader(code)); (passing in inputSource to DocumentBuilder.parse()). – Andrew Mar 22 '17 at 13:53
  • could someone resolve the problem with the intend-number not interpreted? also the XML declaration is on the same line than the root element (don't want to omit it). – benez Dec 27 '17 at 21:47
  • @benez can you please post your problem in a new question? – Stephan Jan 27 '18 at 17:14
  • @Stephan ehm no. i don't see this to be a different topic. feel free to create your own questions. the intend is surely part of what most developers expect to be a "pretty" print. none of the solutions posted here have solved the intend yet. – benez Jan 29 '18 at 23:58
  • @benez Im sorry, the "intend-number not interpreted" problem is totally unclear. Further comments won't explicit it. – Stephan Jan 30 '18 at 10:53
  • @Stephan i am using jdk8u162. executing the above code with `transformerFactory.setAttribute("indent-number", indent);` simply does not add any indent to the output of the method. i expect to see spaces at the beginning of any inner xml-tag. seems like this is ignored. – benez Jan 31 '18 at 01:36
  • @benez I wasn't able to reproduce your issue. Please detail it in a new question and feel free to post a link back here in a comment. – Stephan Jan 31 '18 at 16:12
  • I like this answer except that it adds a line break at the end. – Marteng Jan 09 '20 at 10:23
  • 2
    @Marteng You may try underscore-java library and U.formatXml(xml) method. – Valentyn Kolesnikov Feb 09 '20 at 10:04
10

I guess that the problem is related to blank text nodes (i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer is going to preserve them.

original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);

for (int i = 0; i < blankTextNodes.getLength(); i++) {
     blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}
Stephan
  • 41,764
  • 65
  • 238
  • 329
Aldo
  • 550
  • 3
  • 13
5

This works on Java 8:

public static void main (String[] args) throws Exception {
    String xmlString = "<hello><from>ME</from></hello>";
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    Document document = documentBuilder.parse(new InputSource(new StringReader(xmlString)));
    pretty(document, System.out, 2);
}

private static void pretty(Document document, OutputStream outputStream, int indent) throws Exception {
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    if (indent > 0) {
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(indent));
    }
    Result result = new StreamResult(outputStream);
    Source source = new DOMSource(document);
    transformer.transform(source, result);
}
Tom
  • 43,583
  • 4
  • 41
  • 61
  • Hmmm, that also works for me so I guess the problem must be in the way I read the xml file. – Hungry Sep 16 '14 at 10:15
  • 4
    Warning, this solution only works when in the original xml is not already (partially) indented or contain new lines. That is, it will work for "ME" but NOT for "\nME\n" – Espinosa Oct 13 '15 at 00:33
  • 1
    To casual readers, here is a solution for @Espinosa's warning: http://stackoverflow.com/a/33541820/363573 – Stephan Nov 05 '15 at 10:15
2

I've written a simple class for for removing whitespace in documents - supports command-line and does not use DOM / XPath.

Edit: Come to think of it, the project also contains a pretty-printer which handles existing whitespace:

PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();
ThomasRS
  • 8,215
  • 5
  • 33
  • 48
1

Underscore-java has static method U.formatXml(string). I am the maintainer of the project. Live example

import com.github.underscore.U;

public class MyClass {
    public static void main(String args[]) {
        String xml = "<root>" + //
             "\n   "  + //
             "\n<name>Coco Puff</name>" + //
             "\n        <total>10</total>    </root>";

        System.out.println(U.formatXml(xml));
    }
}

Output:

<root>
   <name>Coco Puff</name>
   <total>10</total>
</root>
Valentyn Kolesnikov
  • 2,029
  • 1
  • 24
  • 31
0

I didn't like any of the common XML formatting solutions because they all remove more than 1 consecutive new line character (for some reason, removing spaces/tabs and removing new line characters are inseparable...). Here's my solution, which was actually made for XHTML but should do the job with XML as well:

public String GenerateTabs(int tabLevel) {
  char[] tabs = new char[tabLevel * 2];
  Arrays.fill(tabs, ' ');

  //Or:
  //char[] tabs = new char[tabLevel];
  //Arrays.fill(tabs, '\t');

  return new String(tabs);
}

public String FormatXHTMLCode(String code) {
  // Split on new lines.
  String[] splitLines = code.split("\\n", 0);

  int tabLevel = 0;

  // Go through each line.
  for (int lineNum = 0; lineNum < splitLines.length; ++lineNum) {
    String currentLine = splitLines[lineNum];

    if (currentLine.trim().isEmpty()) {
      splitLines[lineNum] = "";
    } else if (currentLine.matches(".*<[^/!][^<>]+?(?<!/)>?")) {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];

      ++tabLevel;
    } else if (currentLine.matches(".*</[^<>]+?>")) {
      --tabLevel;

      if (tabLevel < 0) {
        tabLevel = 0;
      }

      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
    } else if (currentLine.matches("[^<>]*?/>")) {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];

      --tabLevel;

      if (tabLevel < 0) {
        tabLevel = 0;
      }
    } else {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
    }
  }

  return String.join("\n", splitLines);
}

It makes one assumption: that there are no <> characters except for those that comprise the XML/XHTML tags.

Andrew
  • 5,839
  • 1
  • 51
  • 72
  • 1
    this snippet is incomplete, since the codeGenerator variable cannot be resolved. is the corresponding class written in java? since java method names do have a different naming convention. – benez Dec 27 '17 at 21:51
  • @benez Sorry about that, and thanks for informing me. I didn't realize there was external code being utilized. Try that, I think it will work; can't test it right now. – Andrew Dec 28 '17 at 19:08
-3

Create xml file :

new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect ! 

so that, when parse the content of the given input source as an XML document and return a new DOM object.

Document original = null;
...
original.parse("data.xml");//input source as an XML document
iCrazybest
  • 2,935
  • 2
  • 24
  • 24