2

Yes I know what buffer is. But watch this:

BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("file.txt"));

How does buffering actually work here? The way I see it, we are buffering data in FileWriter buffer and not in BufferedWriter buffer. Because when buffer of BufferedWriter gets full it will send it to FileWriter buffer and it will be responsible for writing data?

Am I missing something? The way I see it: It looks as we are sipping water from a bigger container to a smaller one. So we end up pouring water from smaller one.

Simmilar example here:

Scanner scanner = new BufferedReader(new FileReader("file.txt"));
scanner.nextLine();

I have seen this everywhere. We actually end up reading from Scanner line by line and not from buffer and its 8k capacity. So what is the point of buffer here? We read line by line from file and not entire buffer at once. Is bufferedReader redundand here?

Please if someone can nicely explain this, I have been struggling for a long time.

Ana Maria
  • 475
  • 2
  • 11

1 Answers1

2

Low level system calls to read and write data are optimized to transfer larger blocks at once. Buffering lets you take advantage of this. When you write single characters or short strings, they are all accumulated in a buffer, and written out as one large block when the buffer is full. When you read data, the read functions request to fill a large buffer, and then it returns data from that buffer.

You're right that wrapping buffered streams within other buffered streams is pointless: at best it achieves nothing, at worst it adds overhead as the data is needlessly copied from one buffer to the next. The buffer closest to the data source matters most.

On the other hand, nothing in the API specification says FileWriter and FileReader have buffers. In fact, it recommends you wrap FileWriter within a BufferedWriter and FileReader within a BufferedReader:

For top efficiency, consider wrapping an OutputStreamWriter within a BufferedWriter so as to avoid frequent converter invocations. For example:

Writer out
  = new BufferedWriter(new OutputStreamWriter(System.out));

(FileWriter is a subclass of OutputStreamWriter)

How does this work internally?

If you look at how FileWriter is implemented though, the story gets complicated because FileWriter does involve a buffer. Some of the details may depend on which version of Java you're using. In OpenJDK, when you create a BufferedWriter that decorates a FileWriter like:

BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("file.txt"));

you are creating a stack of objects like the following, where one object wraps the next:

BufferedWriter -> FileWriter -> StreamEncoder -> FileOutputStream

where StreamEncoder is an internal class, part of how OutputStreamWriter is implemented.

Now, when you write characters to the BufferedWriter instance, it first accumulates them in the BufferedWriter's own buffer. The inner FileWriter does not see any of the data until you have write enough data to fill this buffer (or call flush()).

When the BufferedWriter buffer becomes full, it writes the contents of the buffer to the FileWriter with a single call to write(char[],int,int). This transfer of a large data block is where the efficiency comes from: now FileWriter has a large block of data it can write to the file, and not individual characters.

Then it gets a little complicated: the characters have to be converted to bytes so that they can be written into a file. This is where FileWriter passes these data on to StreamEncoder.

The StreamEncoder class uses a CharsetEncoder to convert the block of characters to bytes all at once, and accumulates the bytes in a buffer of its own. When it's done, it writes the bytes to the innermost FileOutputStream, as one block. FileOutputStream then invokes operating system functions to write to an actual file.

What if you didn't use BufferedWriter?

If you write characters to the FileWriter directly, they get passed on to the StreamEncoder object, which converts them into bytes and stores in its private buffer, and not written directly to the FileOutputStream. This way, the internal implementation of FileWriter gives you some of the benefits of buffering. But this is not a part of the API specification so you shouldn't depend on it.

Also, every call to FileWriter.write will result in an invocation to the CharsetEncoder to encode characters into bytes. It's more efficient to encode large blocks of characters at once, writing single characters or short strings has a higher overhead.

Joni
  • 108,737
  • 14
  • 143
  • 193
  • Thank you, this is great answer. One thing I don't agree/don't understand it that FileWriter is not buffered. If you write something and not flush it, it would never write (unless its buffer is full -1kb I think). And one more thing: Let's say FileWriter indeed has 1kb and bufferedWriter has 8kb. If I were to write 10kb of data, would Java make 2 I/O requests (looking on bigger buffer) or 10 I/O requests (looking on underlying buffer)? – Ana Maria Jul 18 '20 at 18:41
  • 1
    FileWriter does not have a buffer of its own - I've added links to the source code so you can see for yourself. FileWriter writes data to a StreamEncoder, which has a buffer of 8192 bytes. If the StreamEncoder buffer were 1k and you wrote 10k to file FileWriter, you would see 10 write operations of 1k. – Joni Jul 18 '20 at 19:50
  • Thanks man a lot! Just one thing. Please correct me if I read this wrong. So there is buffer that buffers before converting and there is another buffer that buffers before writing, am I right? If I use FileWriter of course.. – Ana Maria Jul 18 '20 at 22:24
  • There's only one buffer when you write directly to a FileWriter: the buffer of bytes that accumulates the result of encoding characters as bytes. When that buffer becomes full, it's written to the output file – Joni Jul 18 '20 at 23:04
  • Nice answer Joni! Can I ask a little thing? If both BufferedWriter and StreamEncoder had lets say same buffer size of 8kb, would there be any performance advantage? I know StreamEncoder sends data to be written from its buffer, but addding BufferedWriter will buffer characters to be translated into bytes. Is it actually faster to translate characters into bytes using bulk (buffer) or 1 by 1? Thanks,Stefan – Stefan Jankovic Jul 22 '20 at 08:19
  • 1
    Every call to encode characters into bytes has some overhead, and bulk encoding has lower total overhead than 1-by-1. The default BufferedWriter buffer is 8k **chars**, while StreamEncoder buffer is 8k **bytes**. In important edge cases like ASCII text chars map to bytes 1 to 1, encoding 8k chars results in 8k bytes, so StreamEncoder needs to make only one call to CharsetEncoder. In the general case one char may be encoded as several bytes, and the result from encoding 8k chars does not fit in the StreamEncoder buffer all at once. In this case StreamEncoder has to call CharsetEncoder twice. – Joni Jul 22 '20 at 11:58