1

I have multiple threads each accepting requests, doing some processing, storing the result in a commit log, and returning the results. In order to guarantee that at most x seconds worth of data is lost, this commit log needs to be fsync'd every x seconds.

I would like to avoid synchronization between threads, which means they each need to have their own commit log rather than a shared log - is it possible to fsync all these different commit logs regularly in a performant way?

This is on Linux, ext4 (or ext3)

(Note: due to the nature of the code, even during normal processing the threads need to re-read some of their own recent data from the commit log (but never other threads commit log data), so I believe it would be impractical to use a shared log since many threads need to read/write to it)

John Smith
  • 11
  • 1

1 Answers1

0

If you only need flushing to happen every few seconds, do you need to fsync() at all? I.e. the OS should do it for you fairly regularly (unless the system is under heavy load and disk I/O is in short supply).

Otherwise, have your threads do something like:

if (high_resolution_time() % n == 0) {
  fsync();
}

Where n is a value that would be e.g. 3 if high_resolution_time() returned returned Unix EPOCH time (which is expressed in seconds). Would make the thread flush the file every 3 seconds.

The problem, of course, is that you need much higher clock resolution to avoid having a thread that passes this code section several times per second not flush its file multiple times in quick succession. I don't know what programming language you use, but in C on Linux you could use gettimeofday:

struct timeval tv;
gettimeofday(&tv, null);
double x = (double)tv.tv_sec * (double)1000000 + (double)tv.tv_usec; 
if (x % 3000000 == 0) {  // fsync every 3 seconds
  fsync();
}
Ragnar
  • 1,122
  • 1
  • 9
  • 16