6

I know that ClickHouse does not have ACID, therefore I don't expect it to have Durability in terms of ACIDness. But, the question is, is there a possibility to lose inserts if the server crashes?

ramazan polat
  • 7,111
  • 1
  • 48
  • 76
  • 1
    That is exactly what "no durability" means. Without some kind of durability such as write-ahead-logging, you can lose commits if the server crashes. (I have no idea what you "expect" not to have if you are asking if that can happen!) – BadZen Feb 11 '20 at 06:28
  • @BadZen Durability in ACID references transactions. A database can be durable even if it does not have transactions like Cassandra, which has write-ahead-logging. I'm asking if clickhouse has such mechanism. – ramazan polat Feb 11 '20 at 12:41
  • 1
    Durability means exactly what I say: the already written data is durable in the case of system shutdown or failure. It doesn't imply anything "sly" about user-visible transactions. (If you prefer to be pedanctic, DBMS 'without transactions' have transaction-per-operation semantics.) CH does not even checkpoint or fsync, you /will/ lose data if it crashes. The idea is to absolutely maximize performance in OLAP / data-mart access patterns. In the documentation, CH recommends you use a durable storage layer (does a storage device have 'transactions', btw?) if you require data durability. – BadZen Feb 11 '20 at 15:35

1 Answers1

7

CH is not Durable.

You can lost a data successfully inserted for last 8-10 minutes on hardware spontaneous reboot, but not on CH crash.

By performance reason CH does not use fsync (this improves insert performance dramatically). So last parts level0 (inserts) will be in the Linux page cache only. You can reduce 10minutes by tuning Linux kernel parameters. (You can configure direct_io for merges (starting with 1 byte) so level1 parts will be written to disk)

And can use replicated tables and quorum insert. But still in this case you can lost data if both replicas are in the same rack on a rack power outage.


In the beginning 2021 CH will start to support WAL+fsync.

It will be controlled by parameters

  • min_rows_to_fsync_after_merge
  • min_compressed_bytes_to_fsync_after_merge
  • min_compressed_bytes_to_fsync_after_fetch
  • fsync_after_insert
  • fsync_part_directory
  • write_ahead_log_bytes_to_fsync
  • write_ahead_log_interval_ms_to_fsync
  • in_memory_parts_insert_sync
Denny Crane
  • 11,574
  • 2
  • 19
  • 30