8

I need to convert a stream of char into a stream of bytes, i.e. I need an adapter from a java.io.Writer interface to a java.io.OutputStream, supporting any valid Charset which I will have as a configuration parameter.

However, the java.io.OutputStreamWriter class has a hidden secret: the sun.nio.cs.StreamEncoder object it delegates to underneath creates an 8192 byte (8KB) buffer, even if you don't ask it to.

The problem is, at the OutputStream end I have inserted a wrapper that needs to count the amount of bytes being written, so that it immediately stops execution of the source system once a specific amount of bytes has been output. And if OutputStreamWriter is creating an 8K buffer, I simply get notified of the amount of bytes generated too late because they will only reach my counter when the buffer is flushing (so there will be already more than 8,000 already-generated bytes waiting for me at the OutputStreamWriter buffer).

So the question is, is there anywhere in the Java runtime a Writer -> OutputStream bridge that can run unbuffered?

I would really, really hate to have to write this myself :(...

NOTE: hitting flush() on the OutputStreamWriter for each write is not a valid alternative. That brings a large performance penalty (there's a synchronized block involved at the StreamEncoder).

NOTE 2: I understand it might be necessary to keep a small char overflow at the bridge in order to compute surrogates. It's not that I need to stop the execution of the source system in the very moment it generates the n-th byte (that would not be possible given bytes can come to me in the form of a larger byte[] in a write call). But I need to stop it asap, and waiting for an 8K, 2K or even 200-byte buffer to flush would simply be too late.

Daniel Fernández
  • 7,335
  • 2
  • 30
  • 33
  • Arguably it *can't* be fully unbuffered - if you call `write` with just the first half of a surrogate pair, for most encodings the writer would *have* to store that and wait for the second character before writing anything. – Jon Skeet Apr 23 '16 at 10:57
  • Well yes, of course, I understand that. But there's a difference between a couple of buffered chars needed to compute a surrogate pair and an 8K buffer... – Daniel Fernández Apr 23 '16 at 10:58
  • 2
    I don't think there's any "of course" there - I suspect many readers will assume you mean no buffering at all, and that you might not be aware of surrogate pairs. I suggest you edit the question to clarify that. (Would it be "too late" if the first call to `write` didn't stop execution, for example?) – Jon Skeet Apr 23 '16 at 11:00

1 Answers1

12

As you have already detected the StreamEncoder used by OutputStreamWriter has a buffer size of 8KB and there is no interface to change that size.

But the following snippet gives you a way to obtain a Writer for a OutputStream which internally also uses a StreamEncoder but now has a user-defined buffer size:

String charSetName = ...
CharsetEncoder encoder = Charset.forName(charSetName).newEncoder();

OutputStream out = ...
int bufferSize = ...

WritableByteChannel channel = Channels.newChannel(out);
Writer writer = Channels.newWriter(channel, encoder, bufferSize);
wero
  • 32,544
  • 3
  • 59
  • 84
  • I cannot give you enough points for this. It works like a charm, thanks so much. The only thing I worry a bit about is that the spec defines that `bufferSize` as the *minimum* buffer size. But I see the standard implementation at `sun.nio.cs.StreamEncoder` simply uses this as a fixed size, so that should do. Thanks again. – Daniel Fernández Apr 23 '16 at 11:34