Why Groovy file write with UTF-16LE produce BOM char?

Question

Do you have idea why first and secod lines below do not produce BOM to the file and third line does? I thought UTF-16LE is correct encoding name and that encoding does no create BOM automatically to beginning of the file.

new File("foo-wo-bom.txt").withPrintWriter("utf-16le") {it << "test"}
new File("foo-bom1.txt").withPrintWriter("UnicodeLittleUnmarked") {it << "test"}
new File("foo-bom.txt").withPrintWriter("UTF-16LE") {it << "test"}

Another samples

new File("foo-bom.txt").withPrintWriter("UTF-16LE") {it << "test"}
new File("foo-bom.txt").getBytes().each {System.out.format("%02x ", it)}

prints

ff fe 74 00 65 00 73 00 74 00

and with java

        PrintWriter w = new PrintWriter("foo.txt","UTF-16LE");
        w.print("test");
        w.close();
        FileInputStream r = new FileInputStream("foo.txt");
        int c;
        while ((c = r.read()) != -1) {
            System.out.format("%02x ",c);
        }
        r.close();

prints

74 00 65 00 73 00 74 00

With Java is does not produce BOM and with Groovy there is BOM.

Welcome to StackOverflow. I would think that the charset would be case-insensitive (it should be in Java), but without any documentation to confirm, I can only assume that `utf-16le` (lowercase) tells `withPrintWriter()` not to emit a BOM, and `UTF-16LE` (uppercase) tells it to emit a BOM. That is the only difference in this example. `UnicodeLittleUnmarked` forces a BOM to be skipped, and `UnicodeLittle` forces a BOM, but maybe `utf-16le`/`UTF16-LE` is more ambiguous in Groovy? — Remy Lebeau, May 29 '15 at 22:41
I tested that also with Java and PrintWriter and none of these encodings does not produce BOM. I think that is correct. If i define to LE or BE. there is no need to set BOM. If I use just UTF-16, Java writes file with Little Endian and there is also BOM In groovy seems that utf-16 and UTF-16 produces BOM. — JukkaU, May 30 '15 at 20:37

Keegan · Answer 1 · 2015-06-11T13:22:55.937

There appears to be a difference in behavior with withPrintWriter. Try this out in your GroovyConsole

File file = new File("tmp.txt")
try {
    String text = " "
    String charset = "UTF-16LE"

    file.withPrintWriter(charset) { it << text }
    println "withPrintWriter"
    file.getBytes().each { System.out.format("%02x ", it) }

    PrintWriter w = new PrintWriter(file, charset)
    w.print(text)
    w.close()
    println "\n\nnew PrintWriter"
    file.getBytes().each { System.out.format("%02x ", it) }
} finally {
    file.delete()
}

It outputs

withPrintWriter
ff fe 20 00 

new PrintWriter
20 00

This is because calling new PrintWriter calls the Java constructor, but calling withPrintWriter eventually calls org.codehaus.groovy.runtime.ResourceGroovyMethods.writeUTF16BomIfRequired(), which writes the BOM.

I'm uncertain whether this difference in behavior is intentional. I was curious about this, so I asked on the mailing list. Someone there should know the history behind the design.

Edit: GROOVY-7465 was created out of the aforementioned discussion.

Thank you. Those examples helped clarify your question. I've edited my answer. — Keegan, Jun 07 '15 at 15:16

Why Groovy file write with UTF-16LE produce BOM char?

1 Answers1