2

for a NetBeans plugin I want to change the content of a file (which is opened in the NetBeans editor) with a specific String and a specific charset. In order to achieve that, I open the file (a DataObject) with an EditorCookie and then I change the content by inserting a different string to the StyledDocument of my data object.

However, I have a feeling that the file is always saved as UTF-8. Even if I write a file mark in the file. Am I doing something wrong?

This is my code:

...

EditorCookie cookie = dataObject.getLookup().lookup(EditorCookie.class);
String utf16be = new String("\uFEFFHello World!".getBytes(StandardCharsets.UTF_16BE));

NbDocument.runAtomic(cookie.getDocument(), () -> {
  try {
    StyledDocument document = cookie.openDocument();
    document.remove(0, document.getLength());
    document.insertString(0, utf16be, null);
    cookie.saveDocument();
  } catch (BadLocationException | IOException ex) {
    Exceptions.printStackTrace(ex);
  }
});

I have also tried this approach which doesn't work too:

... 

EditorCookie cookie = dataObject.getLookup().lookup(EditorCookie.class); 

NbDocument.runAtomic(cookie.getDocument(), () -> {
  try {
    StyledDocument doc = cookie.openDocument();

    String utf16be = "\uFEFFHello World!";
    InputStream is = new ByteArrayInputStream(utf16be.getBytes(StandardCharsets.UTF_16BE));

    FileObject fileObject = dataObject.getPrimaryFile();
    String mimePath = fileObject.getMIMEType();
    Lookup lookup = MimeLookup.getLookup(MimePath.parse(mimePath));
    EditorKit kit = lookup.lookup(EditorKit.class);

    try {
      kit.read(is, doc, doc.getLength());
    } catch (IOException | BadLocationException ex) {
      Exceptions.printStackTrace(ex);
    } finally {
      is.close();
    }

    cookie.saveDocument();
  } catch (Exception ex) {
    Exceptions.printStackTrace(ex);
  }
});
Benny Code
  • 51,456
  • 28
  • 233
  • 198

1 Answers1

1

Your problem is probably here:

String utf16be = new String("\uFEFFHello World!".getBytes(StandardCharsets.UTF_16BE));

This won't do what you think it does. This will convert your string to a byte array using the UTF-16 little endian encoding and then create a String from these bytes using the JRE's default encoding.

So, here's the catch:

A String has no encoding.

The fact that in Java this is a sequence of chars does not matter. Substitute 'char' for 'carrier pigeons', the net effect will be the same.

If you want to write a String to a byte stream with a given encoding, you need to specify the encoding you need on the Writer object you create. Similarly, if you want to read a byte stream into a String using a given encoding, it is the Reader which you need to configure to use the encoding you want.

But your StyledDocument object's method name is .insertString(); You should .insertString() your String object as is; don't transform it the way you do, since this is misguided, as explained above.

fge
  • 119,121
  • 33
  • 254
  • 329
  • Thank you for your great explanation! But if I use `insertString`, then the default encoding of the JRE is taken, right? So maybe I should tinker around with the `EditorKit` to see if I can change the encoding for the `Reader` which is used by the `EditorKit`. – Benny Code Nov 23 '14 at 19:02
  • Not sure; have you tried to just insert the string as is? Also, why the BOM at the beginning? – fge Nov 23 '14 at 19:06
  • I tried to insert the String as it is. Looks good but if I open the file in another editor than NetBeans, then this Editor cannot recognize the file as UTF-16-BE. That's why I want to write the BOM in the beginning, so that other editors can easily detect the charset of my saved files. – Benny Code Nov 23 '14 at 19:12
  • And how is your file created in the first place? The only package I know which writes text files in UTF-16 by default is PowerShell; can't you have the source write them in UTF-8 instead? – fge Nov 23 '14 at 19:25
  • 1
    I could. But the thing is that I am writing a NetBeans plugin to support http://editorconfig.org/ - So I need to make sure that the plugin can write files in latin1, utf-8, utf-8-bom, utf-16be or utf-16le which can be detected by other IDEs and editors. So let me try the EditorKit approach and then I will tell you the results. :) If you are interested in my plugin, then you can find it here: https://github.com/welovecoding/editorconfig-netbeans – Benny Code Nov 23 '14 at 19:36
  • I used the EditorKit read method with the following reader: `InputStreamReader reader = new InputStreamReader(in, StandardCharsets.UTF_16BE);` - But as a result I also got an UTF-8 file so it looks like it's not possible to write files in a different charset. :-( – Benny Code Nov 24 '14 at 01:05