3

I have a String which is XML data. After removing some nodes and adding few. The xml data is having lot of white spaces in it (created during the node removal.)

<A>
<B>
</B>

<!-- some node i deleted and lot of white spaces -->



<c>
</c>


<!-- some more node i deleted and lot of white spaces -->




<E>
</E>

Desired output after String manipulation

<A>
<B>
</B>
<c>
</c>
<E>
</E>
</A>

I can use replaceAll("\s","") but this removes even the new line character and make the xml out of structure for displaying it in UI.

Is there a way to trim it without trimming the new line character?

Edit: This XML data is part of OMElement

Dheeraj Joshi
  • 3,057
  • 8
  • 38
  • 55
  • 1
    How about replacing two consecutive newlines with one? – Thilo Sep 12 '12 at 05:43
  • 3
    Have you tried `str.replaceAll("\n+", "\n")`? – obataku Sep 12 '12 at 05:43
  • 1
    You can use `replaceAll(" ", "")` to remove all spaces in document or something like `replaceAll(">\s+<", "\n")` to replace all space characters (include spaces, tabs and newlines) between tags to only newline – michael Sep 12 '12 at 05:46
  • Yes I tried str.replaceAll("\n+", "\n"). But it is giving me desired result. String strXmlData = xmlLoadData.toString(); strXmlData = strXmlData.replaceAll("\n+", "\n"); Original strXmlData and after replace they are same. – Dheeraj Joshi Sep 12 '12 at 05:56

5 Answers5

3

Can you clarify what you mean? If you mean whitespace other than new-lines, try as follows.

str = str.replaceAll("[ \t\x0B\f\r]", "");

... or, do you instead mean you want to remove extraneous new lines?

str = str.replaceAll("\n{2,}", "\n");

... or do you only want to remove only literal ' ' spaces?

str = str.replace(" ", "");
obataku
  • 29,212
  • 3
  • 44
  • 57
2

try to use someString.replaceAll("\\u0020","") This String is the endocing of whitespaces and should do the job

edited: if you need other take a look at this question. you will find others in the answer of tchrist.

Community
  • 1
  • 1
Matthias Kricke
  • 4,931
  • 4
  • 29
  • 43
2

I suggest to use regex str.replaceAll("(</[^>]+>)\\s+(<[^>]+>)","$1\n$2") which detects the spaces between tags and removes them. It lefts only single end of line

Gaim
  • 6,734
  • 4
  • 38
  • 58
1

If you are using DocumentBuilder to modify XML then you can also make use of below method.

DocumentBuilderFactory.setIgnoringElementContentWhitespace

Specifies that the parsers created by this factory must eliminate whitespace in element content (sometimes known loosely as 'ignorable whitespace')

factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);
Amit Deshpande
  • 19,001
  • 4
  • 46
  • 72
0

There is a costly way of doing this.

Scanner scanner = new Scanner(str);
StringBuffer strBuff = new StringBuffer();
while(scanner.hasNextLine()){
       String line = scanner.nextLine();
           if(line.length() > 0 && !line.trim().equals("")){
                 strBuff.append("\n");
         strBuff.append(line);
       }
}

Eventually when the loop ends we can remove the empty lines from the xml and xml will be well formed. As you can see this is not ideal for large xml since lot of xml string objects are created internally.

Regards
Dheeraj Joshi

Dheeraj Joshi
  • 3,057
  • 8
  • 38
  • 55