I am extracting data from a XML using the axiom.
But I'm getting above error because of having CTRL-CHAR (eg : â, €, ¢, “, ”, ™, ’, – etc) in the XML.
Can any body help me to replace all the CTRL-SHARs to avoid the above error.
Asked
Active
Viewed 6,184 times
2

ironwood
- 8,936
- 15
- 65
- 114
-
1The CTLR-CHAR doesn't refer to those characters you've listed, but to non-printable control characters below U+0020 which (with a few exceptions, notably CR, LF and tab) are not allowed in XML 1.0 documents. If your source documents contain such characters then they're not well-formed XML. – Ian Roberts Sep 14 '12 at 12:03
-
@ Ian : Yep, but the exceptions said them as the CTRL-CHAR isn't it? When I simply replace the detected caharacters one after another it works fine. But I need a handy and robust method for this. – ironwood Sep 14 '12 at 12:07
-
The exception says "code 15", i.e. U+000F. – Ian Roberts Sep 14 '12 at 12:52
1 Answers
0
Currently I'm using following method in this case. But I think there must be a better way than this.
public static String removeNonUtf8CompliantCharacters( final String inString ) {
if (null == inString ) return null;
byte[] byteArr = inString.getBytes();
for ( int i=0; i < byteArr.length; i++ ) {
byte ch= byteArr[i];
// remove any characters outside the valid UTF-8 range as well as all control characters
if ( !(ch < 0x00FD && ch > 0x001F) || ch =='&' || ch=='#') {
byteArr[i]=' ';
}
}
return new String( byteArr );
}

ironwood
- 8,936
- 15
- 65
- 114