5

What is the fastest method to convert a java.nio.ByteBuffer a into a (newly created) CharBuffer b or char[] b.

By doing this it is important, that a[i] == b[i]. This means, that not a[i] and a[i+1] together make up a value b[j], what getChar(i) would do, but the values should be "spread".

byte a[] = { 1,2,3, 125,126,127, -128,-127,-126 } // each a byte (which are signed)
char b[] = { 1,2,3, 125,126,127,  128, 129, 130 } // each a char (which are unsigned)

Note that byte:-128 has the same (lower 8) bits as char:128. Therefore I assume the "best" interpretation would be as I noted it above, because the bits are the same.

After that I also need the vice versa translation: The most efficient way to get a char[] or java.nio.CharBuffer back into a java.nio.ByteBuffer.

towi
  • 21,587
  • 28
  • 106
  • 187
  • 3
    What do you want to do with this char buffer? What should happen if you put in a 2 byte char? How should the translated byte array then look like? Depending on what you need, the most efficient way might be to not convert at all. – Ishtar May 09 '11 at 12:06
  • 1
    In other words, you're looking to interpret the contents of the `ByteBuffer` as a sequence of chars encoded using ISO-8859-1? – Anon May 09 '11 at 12:34
  • The assumption is, there is no value greater then `0xff` in the `char[]` buffer for back-translation. Any behavior/crash would be fine ("unspecified"). ISO-8859-1? As far as I know there are bytes that can not be translated by any codepage (eg `\0`)? I think the many codepage-conversion libraries do not take it nicely if you dump a 8bit datastream in. But I don't know about Java, I will look it up. Imagine I have picture/image data. – towi May 09 '11 at 12:49
  • 1
    Huh? If you have picture/image data, then why do you care about characters? Perhaps if you described what you're trying to accomplish, rather than your desired approach to accomplishing it, you would get more relevant answers. – Anon May 09 '11 at 13:02
  • "...then why do you care about characters..." Image Manipulation Algorithm that has intermediate results greater then 255. – towi May 09 '11 at 15:24
  • 2
    @towi - then the data type you want is called `short` – Anon May 09 '11 at 20:41
  • We are talking about a purely binary conversion between byte-arrays and char-arrays, right? Is it possible, that everybody happily assumes, that: binary (or bit-to-bit) translation from Java's char-array to byte-array is equivalent with *encoding* Java's char-array using UTF-16 to a byte-array? Well, I must disappoint you. – java.is.for.desktop Jul 22 '20 at 23:02

3 Answers3

12

So, what you want is to convert using the encoding ISO-8859-1.

I don't claim anything about efficiency, but at least it is quite short to write:

CharBuffer result = Charset.forName("ISO-8859-1").decode(byteBuffer);

The other direction would be:

ByteBuffer result = Charset.forName("ISO-8859-1").encode(charBuffer);

Please measure this against other solutions. (To be fair, the Charset.forName part should not be included, and should also be done only once, not for each buffer again.)

From Java 7 on there also is the StandardCharsets class with pre-instantiated Charset instances, so you can use

CharBuffer result = StandardCharsets.ISO_8859_1.decode(byteBuffer);

and

ByteBuffer result = StandardCharsets.ISO_8859_1.encode(charBuffer);

instead. (These lines do the same as the ones before, just the lookup is easier and there is no risk to mistype the names, and no need to catch the impossible exceptions.)

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
  • 1
    java.nio.StandardCharsets.ISO_8859_1 and its peers provide a simple reference to the character set without string lookup or throwing exceptions. – davenpcj Sep 21 '13 at 21:23
5

I would agree with @Ishtar's, suggest to avoid converting to a new structure at all and only convert as you need it.

However if you have a heap ByteBuffer you can do.

ByteBuffer bb = ...
byte[] array = bb.array();
char[] chars = new char[bb.remaining()];
for (int i = 0; i < chars.length; i++)
    chars[i] = (char) (array[i + bb.position()] & 0xFF);
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Ok, that's generic, and safe. Thanks. But I'd hope there might be an API call for that? – towi May 09 '11 at 12:55
  • You can be sure the encoding works the way you want. You could try "US-ASCII" but I don't know it works for all 0 - 255. – Peter Lawrey May 09 '11 at 12:59
  • 1
    No, US-ASCII is only for 0-127, other bytes are mapped (in Java, when not using the CharSet API for finer control) to `'?'`, other chars to `(byte)'?'`. Use ISO-8859-1 for complete coverage of the 8 bit range, i.e. to do what your loop does. – Paŭlo Ebermann May 09 '11 at 19:28
0

Aside from deferring creation of CharBuffer, you may be able to get by without one. If code that is using data as characters does not strictly need a CharBuffer or char[], just do simple on-the-fly conversion; use ByteBuffer.get() (relative or absolute), convert to char (note: as pointed out, you MUST unfortunately explicitly mask things; otherwise values 128-255 will be sign-extended to incorrect values, 0xFF80 - 0xFFFF; not needed for 7-bit ASCII), and use that.

StaxMan
  • 113,358
  • 34
  • 211
  • 239