1

When using ExcerptAppender over Chronicle Queue (append only log) is it guaranteed that only the end of file may be truncated in case of power loss i.e. all intermediate records are not corrupted? If so what implementation/filesystem/OS behaviour does this rely on?

I'm interested in linux/x64. Since this is over an mmap - my understanding is that the order of flushing of pages from page cache isn't defined and also the disk can reorder writes. Is it supposed to only be guaranteed for SSDs or a particular filesystem?

hawk
  • 1,827
  • 2
  • 14
  • 28

1 Answers1

1

Queue relies on the OS flushing the data to disk asynchronously. The OS usually ensures data is pushed to disk within 30 seconds by default, however the pages written could be in any order, so while 99% of the last 30 seconds might be written there is a chance all of the last 30 seconds is unreadable. This time boundary isn't dependant on the choice of disk, rather configuration of the OS.

The choice of disk alters the bust throughput sustainable, as well as how much data you can write before needing to archive or delete it.

If you want reliable disk writes we recommend using replication to a 2nd or 3rd machine so that if the machine dies or the whole data centre is unavailable, you can continue operation. This uses Chronicle Queue Enterprise.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • thanks Peter. so if a few records are corrupted how does chronicle queue detect which ones got corrupted? and also does it discard all records after the first corrupt one? – hawk May 03 '19 at 11:29
  • @hawk After detecting corrupt message it marks the area as bad and skips forward in the file so it can continue writing. – Peter Lawrey May 03 '19 at 14:51
  • I still don't understand how exactly the corruption is detected? Is it through checksum? or marking page boundaries with known value? or something else? Even better would be pointing me to where in code is this handled. Thanks! – hawk May 03 '19 at 15:24
  • 1
    @hawk simpler than that, it detects whether the header was completed which is the last operation performed. If the message is missing part of the data, it will be partially full of zeros. Zeros should be treated as zero length or invalid data, and this is what queue does if there is missing data it assumes there is nothing more once it hits a missing page. You might get a truncated message with zeros at the end however. We don't use a checksum due to the overhead this would have. Queue would be an order of magnitude slower, when the real solution is replication. – Peter Lawrey May 05 '19 at 08:32