5

I understood that clickhouse is eventually consistent. So once an insert call returns, it doesn't mean that the data will appear in a select query.

  1. does that apply to stand-alone clickhouse (no distribution, no replication)?
  2. I understand the concept of eventual consistency for data replication, but does it apply with distribution but no replication?
  3. using a distributed+replicated clickhouse, what is a recommended way to know that some insert(s) can be safely looked up?

Basically I didn't find much information on this topic, so maybe I am not asking the best questions. Feel free to enlighten me.

Juh_
  • 14,628
  • 8
  • 59
  • 92

1 Answers1

4
  1. No, but single-node setup shouldn't be considered reliable either.
  2. By default yes, you'll insert to node the client is connected to (probably via some load balancer) and Distributed table will asynchronously forward each piece of data to node where it belongs. The insert_distributed_sync=1 setting will make the client wait synchronously.
  3. On insert use ***MergeTree shard tables directly (not Distributed) with insert_quorum=2 setting (if there are 3 replicas) and retry infinitely with exactly same batch if there are some errors (can use different replicas on retry, since there's a deduplication based on batch hash). Then on reads use select_sequential_consistency=1 setting.
Ivan Blinkov
  • 2,466
  • 15
  • 17
  • Thanks. I'll need to work a little to fully understand, especially the (3), but I know what to do now. – Juh_ Aug 02 '19 at 08:44
  • Concerning the (1), obviously a single-node will not respond upon failure, and loose data if the node/disk is lost. But apart from that, if the (2) is possible then it should also work on a single-node, no? – Juh_ Aug 02 '19 at 08:44