1

I have some trouble with the JDOM2 whitch i use to work with XML files. I want to convert the XML file to a string without any manipulation or optimization.

Thats my Java code to do that:

SAXBuilder builder = new SAXBuilder();
    File xmlFile = f;

    try 
    {
        Document document = (Document) builder.build(xmlFile);

        xml = new XMLOutputter().outputString(document);

    } catch (Exception e) {
        System.out.println(e.getMessage());
    }

    return xml;

But when I compare my string with the original XML file I notice some changes.

The original:

<?xml version="1.0" encoding="windows-1252"?>
<xmi:XMI xmi:version="2.1" xmlns:uml="http://schema.omg.org/spec/UML/2.0" xmlns:xmi="http://schema.omg.org/spec/XMI/2.1" xmlns:thecustomprofile="http://www.sparxsystems.com/profiles/thecustomprofile/1.0" xmlns:SoaML="http://www.sparxsystems.com/profiles/SoaML/1.0">

And the string:

<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmlns:xmi="http://schema.omg.org/spec/XMI/2.1" xmlns:SoaML="http://www.sparxsystems.com/profiles/SoaML/1.0" xmlns:thecustomprofile="http://www.sparxsystems.com/profiles/thecustomprofile/1.0" xmlns:uml="http://schema.omg.org/spec/UML/2.0" xmi:version="2.1">

And all umlauts (ä, ö , ü) are changed too. I will get something like that: '�' instead of 'ä'.

Is there any way to stop that behaviore?

Alucard
  • 317
  • 1
  • 3
  • 15
  • 2
    If you just want the string just read the file to string like you would read any other file to a string. No need for JDOM2 or any other framework. – Dave Sep 04 '15 at 10:27
  • 2
    If you're going to parse and reserialize the XML then you're going to lose "non-information-bearing" details such as the encoding, the order of attributes, or whitespace delimiters within tags. But there's nothing in your stated requirement that explains why you want to parse and reserialize the XML. – Michael Kay Sep 04 '15 at 12:54

3 Answers3

7

Firstly, as others have stated, you shouldn't use any XML processing. Just read the file as a text file.

Secondly, your umlaut characters showing up as '�' is due to an incorrect charset (encoding) being used. The charset error may be in your code, or it may be the XML file.

The original XML file contains encoding="windows-1252", but it's unusual for XML to be encoded in anything other than UTF-8, so I suspect the file is really a UTF-8 file and the encoding it claims to use is not correct.

Try forcing UTF-8 when reading the file. It's good practice, regardless, to specify the charset when converting bytes to text:

String xml = new String(
    Files.readAllBytes(xmlFile.toPath(), StandardCharsets.UTF_8));
VGR
  • 40,506
  • 4
  • 48
  • 63
  • Thanks for your reply. I had some trouble with UTF_8 too, but xml = new String(Files.readAllBytes(f.toPath()), StandardCharsets.ISO_8859_1); works for me. – Alucard Sep 07 '15 at 06:11
  • You may want replace StandardCharsets.ISO_8859_1 with `Charset.forName("windows-1252")`, since your file does say it's encoded in 1252, which is a superset of ISO-8859-1. Using ISO-8859-1 means eventually you may encounter characters in the U+0080 to U+009F range, which are valid in 1252 but which ISO-8859-1 will treat as unknown (and convert to "?" or "�"). – VGR Sep 07 '15 at 16:42
  • Thanks for your improvements! You are right. There are such cases, whitch are fixed with your suggestion. I changed it to: `xml = new String(Files.readAllBytes(f.toPath()), Charset.forName("windows-1252"));` – Alucard Sep 08 '15 at 08:52
0

See if this works for you.

//filename is filepath string
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
StringBuilder sb = new StringBuilder();
while((line=br.readLine())!= null){
    sb.append(line.trim());
}
Pang
  • 9,564
  • 146
  • 81
  • 122
Droy
  • 173
  • 2
  • 17
0

try this :

String xmlToString=FileUtils.readFileToString(new File("/file/path/file.xml"));

You need to have Commons-io jar for this.