-1

I have an XML String which is not formatted properly. I would like to do proper indentation using Java. There are a lot of answers on SO regarding this problem. One of the most widely accepted and brilliant answers is this one:

Pretty print XML in java 8

But the problem with this answer is that, the code always needs a root element whereas I have XML tags in my String as follows:

<person>
<address>New York</address>
</person>
<person>
<address>Ottawa</address>
</person>

As you can see there is no root element here. Just multiple tags of person.

I have tried to see any methods available from the libraries used in the above answer. But to no use.

I don't know if there is a way out. But if someone can think of something I would really appreciate it.

P.S. Please please before you mark this as Duplicate, read the question. I know there are similar questions on SO but my problem is quite different.

Community
  • 1
  • 1
john.p.doe
  • 411
  • 2
  • 10
  • 21
  • 1
    If your XML does not have a root element, then it is invalid. http://www.xmlvalidation.com/index.php?id=1&L=0 – W.K.S Feb 23 '16 at 07:38
  • 1
    Add a root element by string manipulation, indent using the method and remove it again by string manipulation? – Joachim Isaksson Feb 23 '16 at 07:38
  • The first question here should not be "how to do it" but "why you need it"? Pretty printing is usually not needed especially when XML is being processed by programs (which it usually is). – Seelenvirtuose Feb 23 '16 at 07:40
  • @Seelenvirtuose The XML does not need to indented for processing, it needs to be indented so that I can output it in a file in a prettier and readable manner. – john.p.doe Feb 23 '16 at 07:51
  • @JoachimIsaksson I do like the idea but I have more than 100K nodes of person. I will end running out memory by doing so many String operations. – john.p.doe Feb 23 '16 at 07:52
  • Then do the pretty printing outside of your program. Notepad++ for example has an XML plugin that is able to do that. – Seelenvirtuose Feb 23 '16 at 07:52
  • @Seelenvirtuose I cannot ask every user to download Notepad++ and download that plug-in and indent it. – john.p.doe Feb 23 '16 at 07:55
  • @JoachimIsaksson I am already running on borderline with Heap Space. This would definitely create a bottleneck for me. :-( – john.p.doe Feb 23 '16 at 08:07
  • @john.p.doe Added a better description as an answer since it's hard to describe in a comment. If you want to save memory you'll have to let us know in what form you have the data right now, string, file, ...? – Joachim Isaksson Feb 23 '16 at 08:16
  • Are the tags already alligned as shown? So no closing `` is on the same line then another start or closing tag? – SubOptimal Feb 23 '16 at 08:20
  • 1
    You are returning an _invalid_ XML file to your users? – Seelenvirtuose Feb 23 '16 at 08:25
  • @SubOptimal No the tags are not aligned always as shown, they are quite random. – john.p.doe Feb 23 '16 at 08:26
  • @Seelenvirtuose No sir, I plan to hard-code the closing tags later on after the parsing i.e. proper indentation of the child tags. – john.p.doe Feb 23 '16 at 08:26

2 Answers2

1

You can use string manipulation to get close;

Let's say you have your structure in a string called xml;

<person>
<address>New York</address>
</person>
<person>
<address>Ottawa</address>
</person>

Then add a root element;

xml = "<myDummyRoot>" + xml + "</myDummyRoot>";

which gives the structure

<myDummyRoot>
<person>
<address>New York</address>
</person>
<person>
<address>Ottawa</address>
</person>
</myDummyRoot>

This is a valid XML document that should be possible to intent using the method linked, giving something like;

<myDummyRoot>
    <person>
        <address>New York</address>
    </person>
    <person>
        <address>Ottawa</address>
    </person>
</myDummyRoot>

A simple string replaceAll can then remove the root element again

xml = xml.replaceAll("</?myDummyRoot>", "");

...which should leave you with a readable XML document (although indented with some extra spacing).

Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294
  • Actually I can get chunks of objects and then do the dummy append like take 1000 at a time and prettify it and in that manner do it for 100 times.. That could actually work. I am not sure if you were suggesting the same. But your idea should work. Thanks a lot buddy. – john.p.doe Feb 23 '16 at 08:17
1

It seems that your main issue isn't prettifying the XML, but memory management. Joachim has given you a perfectly good approach for prettifying it, but you don't have enough memory to implement it.

Assuming that your string is coming from a file, you should use file manipulation techniques that don't require you to load the entire file at once. You can use FileInputStream and FileOutputStream to create a temporary copy of the file that has the root opening and closing elements. You can then use XMLStreamReader to go through the <person> tags one at a time.

If your string is not in a file, then get it out of memory by dumping it in a temporary file, and proceed as above.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Ken Clubok
  • 1,238
  • 6
  • 11
  • Actually the data isnt coming from file. Its coming from DB and going back as concatenated String in DB. I know its weird but thats how the intent of application is. – john.p.doe Feb 23 '16 at 08:18