7

I am observing the following behavior with the C++ Std library method std::ostream::write().

For buffering the data I am making use of the following C++ API

std::ofstream::rdbuf()->pubsetbuf(char* s, streamsize n)

This works fine ( verified using the strace utility ) as long as the size of data (datasize) we are writing on the file stream using

std::ofstream::write (const char* s, datasize n)

Is less than 1023 bytes ( below this value the writes are accumulated till the buffer is not full), but when the size of data to write exceeds 1023, the buffer is not taken into account and the data is flushed to the file.

e.g. If I set the buffer size to 10KB and write around 512bytes a time, strace will show that multiple writes have been combined into a single write

writev(3, [{"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 9728}, {"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 512}], 2) = 10240 ( 10 KB )
writev(3, [{"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 9728}, {"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 512}], 2) = 10240
...

but when I write 1024 bytes a time ( keeping the buffer fixed to 10 KB), now strace shows me that it is not using the buffer and each ofstream::write call is being translated to write system call.

writev(3, [{NULL, 0}, {"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 1024}], 2) = 1024 ( 1KB )
writev(3, [{NULL, 0}, {"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 1024}], 2) = 1024
...

Is there any C++ API Call or Linux Tuning Parameter which I am missing?

unwind
  • 391,730
  • 64
  • 469
  • 606
Abhishek
  • 117
  • 1
  • 7
  • Why do you care? Can't you use `std::fflush` manipulator? – Basile Starynkevitch Mar 18 '14 at 11:15
  • There's no such thing. There's the `std::flush` manipulator, but it forces a flush, which is the opposite of what the OP wants. – Sebastian Redl Mar 18 '14 at 11:23
  • @SebastianRedl To be honest OP didn't even state what he wants. – Bartek Banachewicz Mar 18 '14 at 11:26
  • Writing 1024 bytes at a time is efficient. In a modern architecture there is very little to gain by making larger writes. – Klas Lindbäck Mar 18 '14 at 11:32
  • 4
    @KlasLindbäck I disagree: With 10 GB/s memory bus bandwidth, copying 1 kiB takes roughly 100 ns. A syscall generally takes several microseconds. I'd say, it takes roughly 100 kiB for the syscall overhead to become neglegible. – cmaster - reinstate monica Mar 18 '14 at 11:38
  • @cmaster +1 You seem to be correct. On my RHEL6 server write performance pretty much levels out at 8kb writes (which is equal to my disk page size, but that may be a coincidence). Doing 1kb writes took almost 3 times as long and doing 100 kb writes took as long as doing 8 kb writes. The reason for the performance peaking may be the limit of physical writes to the disk system (100 Mb in 0.35 s). – Klas Lindbäck Mar 18 '14 at 12:20
  • Actually I am trying to reduce the number of write system calls done by my application, default buffer size of C++ is around 8KB ( a write sys call after it has accumulated 8KB of data, but that also works when the data i am writing is less than 1K) – Abhishek Mar 18 '14 at 14:39

2 Answers2

3

This is an implementation detail of libstdc++, implemented around line 650 of bits/fstream.tcc. Basically, if the write is larger than 2^10, it will skip the buffer.

If you want the rationale behind this decision, I suggest you send a mail to the libstdc++ development list.

http://gcc.gnu.org/ml/libstdc++/

Sebastian Redl
  • 69,373
  • 8
  • 123
  • 157
  • Posted a mail to libstdc++ mailed, and yes its kind of hard coded behavior. To add further, I executed same test case on Sun OS ( Sun Studio CC Compiler ), the behavior is different there, it respects the buffer given ( no boundary on 1024 ) – Abhishek Mar 30 '14 at 09:38
  • This seems to be still valid today (found the hardcoded constant here https://github.com/gcc-mirror/gcc/blob/master/libstdc++-v3/include/bits/fstream.tcc#L768). – iMineLink Jun 14 '21 at 15:56
1

Looks like someone writing the stdlib implementation made an "optimization" without giving enough thought to it. So, the only workaround for you would be to avoid the C++ API and use the standard C library.

This is not the only suboptimality in the GNU/Linux implementation of the standard C++ library: on my machine, malloc() is 100 cycles faster than the standard void* operator new (size_t size)...

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • `new()` calls the constructor whereas `malloc()` doesn't. And it does a few more nice things. Comparing the speed of both is just nonsense. – scai Mar 18 '14 at 12:20
  • 1
    @scai Sorry to contradict, but that is precisely the reason why I gave the entire signature: I'm not talking about the `new` keyword, and I'm not even talking about class specific `new` operators, I'm talking about the *one global function that does the allocation itself, and nothing else*. This is the function, that you can easily implement by just calling through to `malloc()`, which is the basis for my measurement. – cmaster - reinstate monica Mar 18 '14 at 12:29