1

I have a method which returns a String with a formatted xml. The method reads the xml from a file on the server and parses it into the string:

Esentially what the method currently does is:

  private ServletConfig config;
  InputStream xmlIn = null ;
  xmlIn = config.getServletContext().getResourceAsStream(filename + ".xml") ; 
  String xml = IOUtils.toString(xmlIn);
  IOUtils.closeQuietly(xmlIn);
  return xml;  

What I need to do is add a new input argument, and based on that value, continue returning the formatted xml, or return unformatted xml.

What I mean with formatted xml is something like:

<xml>
  <root>
    <elements>
       <elem1/>
       <elem2/>
    <elements>
  <root>
</xml>

And what I mean with unformatted xml is something like:

<xml><root><elements><elem1/><elem2/><elements><root></xml>

or:

<xml>
<root>
<elements>
<elem1/>
<elem2/>
<elements>
<root>
</xml>

Is there a simple way to do this?

Rob Hruska
  • 118,520
  • 32
  • 167
  • 192
Fernando Moyano
  • 1,097
  • 4
  • 15
  • 29

7 Answers7

2

Strip all newline characters with String xml = IOUtils.toString(xmlIn).replace("\n", ""). Or \t to keep several lines but without indentation.

tobiasbayer
  • 10,269
  • 4
  • 46
  • 64
  • It didn't work, I'm still getting the formatted xml into "xml" variable :S, wonder why – Fernando Moyano Dec 12 '11 at 15:13
  • Try to strip "\r" as well. Might be Windows line breaks. `String xml = IOUtils.toString(xmlIn).replace("\n", "").replace("\r", "")` – tobiasbayer Dec 12 '11 at 15:23
  • 6
    will mangle any \r\n that can be present in the tag value itself. not all carriage returns are for formatting, some can be part of the data itself – Newtopian Dec 12 '11 at 15:45
  • Thanks man, it finally worked, the debugger was not showing to me reallity :). You have helped me a lot. Thanks to everybody else that helped with this issue. – Fernando Moyano Dec 12 '11 at 18:51
  • 1
    @Newtopian: Yes. But for OP's problem it seems simple and effective enough. – tobiasbayer Dec 13 '11 at 07:48
  • @CodeBrickie The possibility of side effects is too much for me but it is evidently sufficient for him :-). Then again not all problems require stainless steel airtight bunker code. – Newtopian Dec 13 '11 at 08:03
2

if you are sure that the formatted xml like:

<xml>
  <root>
    <elements>
       <elem1/>
       <elem2/>
    <elements>
  <root>
</xml>

you can replace all group 1 in ^(\s*)< to "". in this way, the text in xml won't be changed.

Kent
  • 189,393
  • 32
  • 233
  • 301
1

Try something like the following:

TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(
    new StreamSource(new StringReader(
        "<xsl:stylesheet version=\"1.0\"" +
        "   xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">" + 
        "<xsl:output method=\"xml\" omit-xml-declaration=\"yes\"/>" +
        "  <xsl:strip-space elements=\"*\"/>" + 
        "  <xsl:template match=\"@*|node()\">" +
        "   <xsl:copy>" +
        "    <xsl:apply-templates select=\"@*|node()\"/>" +
        "   </xsl:copy>" +
        "  </xsl:template>" +
        "</xsl:stylesheet>"
    ))
);
Source source = new StreamSource(new StringReader("xml string here"));
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);

Instead of source being StreamSource in the second instance, it can also be DOMSource if you have an in-memory Document, if you want to modify the DOM before saving.

DOMSource source = new DOMSource(document);

To read an XML file into a Document object:

File file = new File("c:\\MyXMLFile.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();

Enjoy :)

Chris Dennett
  • 22,412
  • 8
  • 58
  • 84
1

an empty transformer with a parameter setting the indent params like so

public static String getStringFromDocument(Document dom, boolean indented) {
    String signedContent = null;        
    try {
            StringWriter sw = new StringWriter();
            DOMSource domSource = new DOMSource(dom);
            TransformerFactory tf = new TransformerFactoryImpl();
            Transformer trans = tf.newTransformer();
            trans = tf.newTransformer();
            trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");

            trans.transform(domSource, new StreamResult(sw));
            sw.flush();
            signedContent = sw.toString();

        } catch (TransformerException e) {
            e.printStackTrace();
        }
        return signedContent;
    }

works for me.

the key lies in this line

 trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");
Newtopian
  • 7,543
  • 4
  • 48
  • 71
0

Kotlin.

An indentation will usually come after new line and formatted as one space or more. Hence, to make everything in the same column, we will replace all of the new lines, following one or more spaces:

xmlTag = xmlTag.replace("(\n +)".toRegex(), " ")
Oz Shabat
  • 1,434
  • 17
  • 16
0

You can: 1) remove all consecutive whitespaces (but not single whitespace) and then replace all >(whitespace)< by >< applicable only if usefull content does not have multiple consecutive significant whitespaces 2) read it in some dom tree and serialize it using some nonpretty serialization

    SAXReader reader = new SAXReader();
    Reader r = new StringReader(data);
    Document document = reader.read(r);
    OutputFormat format = OutputFormat.createCompactFormat();
    StringWriter sw = new StringWriter();
    XMLWriter writer = new XMLWriter(sw, format);
    writer.write(document);
    String string = writer.toString();

3) use Canonicalization (but you must somehow explain to it that those whitespaces you want to remove are insignificant)

Alpedar
  • 1,314
  • 1
  • 8
  • 12
  • Let me see if I understand, first of all, thanks for your answer :) . About point 1, removing withespaces does not seems good, because some of the values in the xml file may contain spaces, why do you think I should remove consecutive spaces?. About point 2), what do you mean with "nonpretty" serialization? (sorry I don't get you, but I'm kind of new to it :) ). About point 3, what do you mean with canonicalization?. Again, thanks for your help – Fernando Moyano Dec 12 '11 at 15:17
0

If you fancy trying your hand with JAXB then the marshaller has a handy property for setting whether to format (use new lines and indent) the output or not.

JAXBContext jc = JAXBContext.newInstance(packageName);
Marshaller m = jc.createMarshaller();
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
m.marshal(element, outputStream);

Quite an overhead to get to that stage though... perhaps a good option if you already have a solid xsd

Edd
  • 8,402
  • 14
  • 47
  • 73