3

I'm using read/write to read/write from/to local disk regular files.

I have to read/write small amount data. for example

read(fd, buf, 15)
write(fd, buf, 39);

And I was told that I should avoid small amount of data read/write when someone reviews my pull request.

They said: for example, I should allocate a large memory like 4k, and first copy the small data to the large memory, and write the large memory once instead of writing small data many times. And I should also read like 4k data once, and store it to big memory, then I can use that big memory.

But I understand that when I write small data, I write to page cache(?) so kernel will take care of the memory? When it's big enough, kernel will write to disk?

Am I understand right? Should I avoid read/write small data?

garen96
  • 175
  • 1
  • 9
  • 1
    "For example, I should allocate a large memory like 4k" --> No, the underlying I/O system is already doing that. Code for clarity. Focus on overall code structure and performance, not such small issues. – chux - Reinstate Monica Jun 08 '19 at 05:46
  • @chux That's true of C standard library functions like `fread()` and `fwrite()`, not for POSIX syscall-based I/O like `read()` and `write()`. – EOF Jun 08 '19 at 06:39
  • Is there any reason you're using `read` instead of `fread`? The latter will do the buffering for you. – William Pursell Jun 08 '19 at 06:57
  • 1
    If the project you're working on has requirements concerning IO performance, you should test against those. Only if your code doesn't meet the requirements think about optimization. If the projects has no requirements use the most simple code to express your intentions. – the busybee Jun 08 '19 at 08:43

2 Answers2

4

Am I understand right? Should I avoid read/write small data?

You are thinking right and no, there is no need to avoid read/write of small data.

The I/O subsystem already provides a read buffer of BUFSIZ bytes (8192 bytes on Linux and 512 bytes on windows) (the #define has shifted around in glibc during the past year, originally derived from _IO_BUFSIZE, then _LIO_BUFSIZ and now just BUFSIZ in the glibc source)

Here is the glibc commit going from _IO_BUFSIZ to just plain BUFSIZ Mechanically remove IO name aliases for types and constants (Wed, 7 Feb 2018).

So it doesn't matter if you are reading one-byte or 8192 bytes (on Linux) or 512 bytes (on Windows) there is no performance penalty whatsoever. The I/O buffer has already been filled with BUFSIZ bytes (or the number of bytes until EOF is encountered if the file contains less than BUFSIZ bytes) on your first request for data from the file. So you are reading direct from the read-buffer in memory and there is no performance penalty for small reads.

Writes are handled in a similar manner and not written to disk until the write-buffer is full (or syncfs or fsync is called) causing all buffered file data to be written to the underlying filesystem.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
0

Is read/write small size of data performance problematic?

If you only need to read/write a small amount of data, reading/writing more is a waste of time.

If you need to read/write a large amount of data, reading/writing many small pieces means that you pay the overhead of switching between user-space and kernel many times (regardless of whether it's switches caused by kernel API calls or switches caused by things like page faults). Whether or not this is problematic depends on the scenario - e.g. for a rough prototype that's only intended to be executed 3 times it's irrelevant, but for high-performance production software that spends a lot of time on IO it can be undesirable (especially now with Spectre and Meltdown mitigation increasing the switching costs, and especially if there's no other reason, like "code maintenance", that justifies the extra overhead).

Brendan
  • 35,656
  • 2
  • 39
  • 66