0

I'm hosting ClickHouse (v20.4.3.16) in 2 replicas on Kubernetes and it makes use of Zookeeper (v3.5.5) in 3 replicas (also hosted on the same Kubernetes cluster).

I would need to migrate the Zookeeper used by ClickHouse with another installation, still 3 replicas but v3.6.2.

What I tried to do was the following:

  • I stopped all instances of ClickHouse in order to freeze Zookeeper nodes. Using zk-shell, I mirrored all znodes from /clickhouse of the old ZK cluster to the new one (it took some time but it was completed without problems)
  • I restarted all instances of ClickHouse, one at a time, now attached to the new instance of Zookeeper.
  • Both the ClickHouse instances started correctly, without any errors, but all the times I try (or someone tries) to add rows to a table with an insert, ClickHouse logs something like the following:
2021.01.13 13:03:36.454415 [ 135 ] {885576c1-832e-4ac6-82d8-45fbf33b7790} <Warning> default.check_in_availability: Tried to add obsolete part 202101_0_0_0 covered by 202101_0_1159_290 (state Committed)

and the new data is never inserted.

I've read all the info about Data Replication and Deduplication, but I am sure I'm adding new data in the insert, plus all tables make use of temporal fields (event_time or update_timestamp and so on) but it simply doesn't work.

Attaching ClickHouse back to the old Zookeeper, the problem is not happening with the same data inserted.

Is there something which needs to be done prior to change Zookeeper endpoints? Am I missing something obvious?

AndD
  • 2,281
  • 1
  • 8
  • 22

1 Answers1

2

Using zk-shell, I

You cannot use this method because it does not copy autoincrement values which are used for part block numbers.

There are much simpler way. You can migrate ZK cluster by adding new ZK nodes as followers.

Here is a plan for ZK 3.4.9 (no dynamic reconfiguration):
1. Configure the 3 new ZK nodes as a cluster of 6 nodes (3 old + 3 new), start them. No changes needed for the 3 old ZK nodes at this time.
    The new server would not connect and download a snapshot, so I had to start one of them in the cluster of 4 nodes first.
2. Make sure the 3 new ZK nodes connected to the old ZK cluster as followers (run echo stat | nc localhost 2181 on the 3 new ZK nodes)
3. Confirm that the leader has 5 synced followers (run echo mntr | nc localhost 2181 on the leader, look for zk_synced_followers)
7. Remove the 3 old ZK nodes from zoo.cfg on the 3 new ZK nodes.
8. Stop data loading in CH (this is to minimize errors when CH loses ZK).
4. Change the zookeeper section in the configs on the CH nodes (remove the 3 old ZK servers, add the 3 new ZK servers)
5. Restart all CH nodes (CH must restart to connect to different ZK servers)
6. Make sure there are no connections from CH to the 3 old ZK nodes (run echo stat | nc localhost 2181 on the 3 old nodes, check their Client ssection).
11. Turn off the 3 old ZK nodes
9. Restart the 3 new ZK nodes. They should form a cluster of 3 nodes.
10. When CH reconnects to ZK, start data loading.

Altinity KB

Denny Crane
  • 11,574
  • 2
  • 19
  • 30
  • Thanks for the answer! I can't exactly increase the number of ZK nodes and then split them from the cluster that way because I'm excuting ZK on Kubernetes. I was thinking of increasing the number of nodes.. but then, instead of splitting them, doing a backup and restore on the real new nodes (changing the myid) – AndD Jan 14 '21 at 09:36
  • >because I'm excuting ZK on Kubernetes. You probably wrong. Ask this question with k8s tags – Denny Crane Jan 14 '21 at 13:33
  • You can simply copy snapshots and logs from the old ZK server to the new you are able to stop CH for 5 minutes. – Denny Crane Jan 14 '21 at 13:35
  • Yeah I just did that, was unsure on how to copy them since I was also upgrading version of ZK but I made it work – AndD Jan 14 '21 at 13:36
  • And yeah I was wrong, I could have used dynamic reconfiguration to link the two clusters together, most probably, but I found out only after I was done with the other method. – AndD Jan 14 '21 at 13:37