1

In my program, I hold two files open for writing, a content-file, containing chunks of data, and an index-file, containing a map over which chunks of data has been written so far.

I would like to flush them both to disc, as performant as possible, with the only constraint that the blocks in the data-file must be written before the corresponding blocks in the map-file (naturally).

The catch is that I would like to avoid blocking I.E. doing an fsync, both for latency and throughput-reasons.

Any ideas?

Rawler
  • 1,480
  • 1
  • 11
  • 23

1 Answers1

1

I don't think you can do this easily in a single execution path. You need fsync to have the write to disk guaranteed - and this is going to have to wait for the write.

I suspect it is possible (but not easy) to do this by delegating the writing task to a separate thread or process. Generate the data in your existing program and 'write' it to the second thread/process using any method that looks sensible. This can be non-blocking. The second thread would then write any new data to the data to your content-file, then fsync, then write the index-file, then check for new data again. Key design decisions relate to how you separate the two execution paths, how you communicate between them, and if you need to report the write back to the main program. This could still have latency and throughput issues, but that's part of the cost of choosing to have the index-file and content-file in sync. At least there would be a chance of getting work done while waiting on the disk.

It could be worth looking to see if this is well encapsulated so as to be useful to you in the source of any of the transactional databases. You could also investigate the sync option when you mount the file system for the content-file.

Andrew Walker
  • 2,451
  • 2
  • 18
  • 15
  • Sorry, for some reason, SO didn't tell me I had an answer, and didn't recall the question until now. It's an interesting solution, I think I'll try it. It fits pretty well, since I have a pool of assets that should occasionally be synced to disk, and designing a writer-thread to round-robin the pool, lock&clone the index, release lock, fsync the asset, and then rewrite the index, sounds simple enough, thanks! I don't know of any existing database matching my use. "Objects" are blobs from ~1MB -> 15GB, requiring pread()-style access. – Rawler Jun 22 '10 at 18:54