2

After a Compute Engine worker node writes files into a gcsfuse mounted local directory and closes them, I want it to synchronously flush the data through to GCS before it notifies other worker nodes that all the files are ready. This is to ensure synchronization between workers.

Q. How to ask gcsfuse to write-through to GCS, then wait for that to complete?

Ideas:

  • Run the Linux sync command?
  • Unmount the directory then wait for that fusermount command to return? (Besides the write-through time, would it take more than a second or two to unmount then remount for the next worker task?)
  • Make all the programs in this task call fsync() on all their output files? That'd be challenging.
  • Write an additional file, then flush() and fsync() that one?
Jerry101
  • 12,157
  • 5
  • 44
  • 63

1 Answers1

3

Have a look at gcsfuse semantics:

Inodes may be opened for writing. Modifications are reflected immediately in reads of the same inode by processes local to the machine using the same file system. After a successful fsync or a successful close, the contents of the inode are guaranteed to have been written to the GCS object with the matching name if the object's generation and meta-generation numbers still matched the source generation of the inode. (They may not have if there had been modifications from another actor in the meantime.) There are no guarantees about whether local modifications are reflected in GCS after writing but before syncing or closing.

So if your worker closes the files after writing them subsequent dependencies should see them consistently.

mensi
  • 9,580
  • 2
  • 34
  • 43