10

I want to write a CharSequence to OutputStream using a specified CharSet. Basically what a Writer initialized with the same CharSet would do, when write(String) is called.

The catch is, there are many CharSequences to be written and some are pretty large. To complicate matters more everything may be written to multiple OutputStream's. I can easily implement that by using (actually I currently have implemented it that way):

byte[] rawBytes = CharSequence.toString().getBytes(CharSet)
for (OutputStream out : outputTargets) {
    out.write(rawBytes);
}

But obviously the String is a totally unwanted garbage object here, as is the byte[] array. I'm looking for a method that allows me to do the encoding directly without intermediate objects. Surprisingly this seems to be impossibly - everywhere I looked in the JRE where a CharSequence is accepted it gets quickly converted into a String before any work is done.

Most (all?) of the conversion work for the CharSet seems to be done in non-public classes, so I haven't found any way to access any of that in a transparent and legal way.

How can the garbage be avoided / the JRE's CharSet encoding facilities be used directly?

Durandal
  • 19,919
  • 4
  • 36
  • 70

2 Answers2

8

You can use Charset to encode a CharSequence to a byte array:

private static byte[] encodeUtf8(CharSequence cs) {
    ByteBuffer bb = Charset.forName("UTF-8").encode(CharBuffer.wrap(cs));
    byte[] result = new byte[bb.remaining()];
    bb.get(result);
    return result;
}

If, instead of OutputStream, you're using an instance of WritableByteChannel, its write method takes ByteBuffer directly, so you don't even need to copy the byte buffer to a byte array first.

C. K. Young
  • 219,335
  • 46
  • 382
  • 435
  • 2
    I believe the OP wants to avoid creating an in-memory byte array for the entire sequence. Imagine that the CharSequence is 10 times larger than available RAM. In that case this method won't work, right? – Keith Aug 29 '13 at 15:32
  • 1
    That is a fair point, and a good use case for your solution (+1). – C. K. Young Aug 29 '13 at 15:34
  • Well my CharSequences are usually not all that large (a few K, but they're frequent and it generates a lot of extra garbage). There is also the hard cap due to the length() method returning an int, preventing one from representing a big textfile as CharSequence. The idea with CharBuffer.wrap(), while I won't be using it for this particular problem may be helpful in other situations. – Durandal Aug 29 '13 at 16:58
  • And how to make similarly backward conversion? i.e. from byte array to say char array? – ivan.ukr Apr 11 '21 at 19:12
6

Iterate over the characters of the sequence and write them to a writer.

OutputStream outputStream = ....
CharSequence charSequence = ....
Charset charset = ....

Writer writer = new OutputStreamWriter(outputStream, charset);

for (int i = 0; i < charSequence.length(); i++) {
    writer.write(charSequence.charAt(i));
}
Keith
  • 4,144
  • 1
  • 19
  • 14
  • I was reluctant to do this (as it requires me to do some design changes), but after some thought this seems to be a the simplest, yet resonably effective method (if one takes care the OutputStreams are ensured to be buffered). – Durandal Aug 29 '13 at 16:53