I know that ClickHouse does not have ACID
, therefore I don't expect it to have D
urability in terms of ACID
ness. But, the question is, is there a possibility to lose inserts if the server crashes?
Asked
Active
Viewed 1,962 times
6

ramazan polat
- 7,111
- 1
- 48
- 76
-
1That is exactly what "no durability" means. Without some kind of durability such as write-ahead-logging, you can lose commits if the server crashes. (I have no idea what you "expect" not to have if you are asking if that can happen!) – BadZen Feb 11 '20 at 06:28
-
@BadZen Durability in ACID references transactions. A database can be durable even if it does not have transactions like Cassandra, which has write-ahead-logging. I'm asking if clickhouse has such mechanism. – ramazan polat Feb 11 '20 at 12:41
-
1Durability means exactly what I say: the already written data is durable in the case of system shutdown or failure. It doesn't imply anything "sly" about user-visible transactions. (If you prefer to be pedanctic, DBMS 'without transactions' have transaction-per-operation semantics.) CH does not even checkpoint or fsync, you /will/ lose data if it crashes. The idea is to absolutely maximize performance in OLAP / data-mart access patterns. In the documentation, CH recommends you use a durable storage layer (does a storage device have 'transactions', btw?) if you require data durability. – BadZen Feb 11 '20 at 15:35
1 Answers
7
CH is not Durable.
You can lost a data successfully inserted for last 8-10 minutes on hardware spontaneous reboot, but not on CH crash.
By performance reason CH does not use fsync (this improves insert performance dramatically). So last parts level0 (inserts) will be in the Linux page cache only. You can reduce 10minutes by tuning Linux kernel parameters. (You can configure direct_io for merges (starting with 1 byte) so level1 parts will be written to disk)
And can use replicated tables and quorum insert. But still in this case you can lost data if both replicas are in the same rack on a rack power outage.
In the beginning 2021 CH will start to support WAL+fsync.
It will be controlled by parameters
- min_rows_to_fsync_after_merge
- min_compressed_bytes_to_fsync_after_merge
- min_compressed_bytes_to_fsync_after_fetch
- fsync_after_insert
- fsync_part_directory
- write_ahead_log_bytes_to_fsync
- write_ahead_log_interval_ms_to_fsync
- in_memory_parts_insert_sync

Denny Crane
- 11,574
- 2
- 19
- 30
-
So if I use replication and only one server is crashed then I won't lose inserts? – ramazan polat Feb 11 '20 at 16:48
-
@RamazanPolat usually replication delay is about 1-10 seconds. So by default (without quorum_insert) you will loose these 1-10 seconds. – Denny Crane Feb 11 '20 at 16:56
-
-
1@RamazanPolat replication + quorum_insert =? not losing insert . Yes. But quorum_insert has restrictions (check docs) and for me it does not worth it. – Denny Crane Feb 11 '20 at 17:53
-
2