0

This code writes two strings in a file channel

final byte[] title = "Title: ".getBytes("UTF-16");
final byte[] body = "This is a string.".getBytes("UTF-16");
ByteBuffer titlebuf = ByteBuffer.wrap(title);
ByteBuffer bodybuf = ByteBuffer.wrap(body);
FileChannel fc = FileChannel.open(p, READ, WRITE, TRUNCATE_EXISTING);
fc.position(title.length); // second string written first, but not relevant to the problem
while (bodybuf.hasRemaining()) fc.write(bodybuf);
fc.position(0);
while (titlebuf.hasRemaining()) fc.write(titlebuf);

Each string is prefixed by a BOM.

[Title: ?T]  *254 255* 0 84 0 105 0 116 0 108 0 101 58 0 32 *254 255* 0 84

While this is ok to have one at the beginning of the file, this creates a problem when there is one in the middle of the stream.

How can I prevent this to happen?

mins
  • 6,478
  • 12
  • 56
  • 75

1 Answers1

2

the BOM bytes is inserted when you call get UTF-16 with BOM:

final byte[] title = "Title: ".getBytes("UTF-16");

check the title.length and you will find it contains additional 2 bytes for BOM marker

so you could process these arrays and remove the BOM from it before wrapp into ByteBuffer, or you can ignore it when you write ByteBuffer to file

other solution, you can use UTF-16 Little/BIG Endianness which will not write BOM marker:

final byte[] title = "Title: ".getBytes("UTF-16LE"); 

or you can use UTF-8 if UTF-16 is not required:

final byte[] title = "Title: ".getBytes("UTF-8");
Wajdy Essam
  • 4,280
  • 3
  • 28
  • 33
  • Thanks, all options work. I also read elsewhere that BOM is written for UTF-16 just because there are two endianness to be discriminated, and as you said if one is specified, then BOM is not written. – mins Jul 06 '14 at 21:10