0

I am running an application which opens a file in an NFS mount with O_DSYNC option. The application then writes 6500 bytes of data 1000 times to the file in a loop.

I monitored the client behavior and noticed that it was sending the writes to the underlying filesystem in batches of 4096 and 8192 bytes.

As per man open, Write operations on a file opened with O_DSYNC will complete according to the requirements of synchronized I/O data integrity completion. It further says that,

O_DSYNC provides synchronized I/O data integrity completion, meaning write operations will flush data to the underlying hardware, but will only flush metadata updates that are required to allow a subsequent read operation to complete successfully. 

I assumed that with O_DSYNC, write() call will not return until the underlying filesystem has successfully written the data. That's not what's happening here. NFS client is caching writes and flushing them in multiples of 4k. Why is this so?

Note that, I am using an Amazon EC2 instance running Linux version 4.9, with a page size of 4096.

user1071840
  • 3,522
  • 9
  • 48
  • 74
  • What behavior do you see on the older versions? – Andrew Henle Aug 23 '17 at 21:09
  • @AndrewHenle, When I simultaneously run a writer and a reader on 2 different machines, such that the writer sleeps for 4min after 500+ iterations of the loop. I note that application prints a log that it has written `6500xnumber_of_iterations` bytes, but the reader claims to be few bytes short. This is seen only on 4.9 and not on 4.1. – user1071840 Aug 23 '17 at 21:31
  • O_DSYNC only refers to file buffers in your process. NFS has its own SYNC option for doing immediate writes vs. caching. You also would have to set sync in the NFS export. – stark Aug 23 '17 at 21:31
  • @stark, NFS mount is configured to write through, not use cache – user1071840 Aug 23 '17 at 21:32

1 Answers1

0

Device writes can only be multiples of the storage block size: 512 bytes for old disks or 4096 for many new disks. Since files aren't aligned with disk blocks, this could cause a read-modify-write of two disk blocks resulting in an 8k write to the device, even though the file write is much smaller.

stark
  • 12,615
  • 3
  • 33
  • 50