Error restoring Rancher: This cluster is currently Unavailable; areas that interact directly with it will not be available until the API is ready

Question

I am trying to backup and restore rancher server (single node install), as the described here.

After backup, I tried to turn off the rancher server node, and I run a new rancher container on a new node (in the same network, but another ip address), then I restored using the backup file.

After restoring, I logged in to the rancher UI and it showed the error below:

So, I checked the logs of the rancher server and it showed as below:

2019-10-05 16:41:32.197641 I | http: TLS handshake error from 127.0.0.1:38388: EOF 2019-10-05 16:41:32.202442 I | http: TLS handshake error from 127.0.0.1:38380: EOF 2019-10-05 16:41:32.210378 I | http: TLS handshake error from 127.0.0.1:38376: EOF 2019-10-05 16:41:32.211106 I | http: TLS handshake error from 127.0.0.1:38386: EOF 2019/10/05 16:42:26 [ERROR] ClusterController c-4pgjl [user-controllers-controller] failed with : failed to start user controllers for cluster c-4pgjl: failed to contact server: Get https://192.168.94.154:6443/api/v1/namespaces/kube-system?timeout=30s: waiting for cluster agent to connect 2019/10/05 16:44:34 [ERROR] ClusterController c-4pgjl [user-controllers-controller] failed with : failed to start user controllers for cluster c-4pgjl: failed to contact server: Get https://192.168.94.154:6443/api/v1/namespaces/kube-system?timeout=30s: waiting for cluster agent to connect 2019/10/05 16:48:50 [ERROR] ClusterController c-4pgjl [user-controllers-controller] failed with : failed to start user controllers for cluster c-4pgjl: failed to contact server: Get https://192.168.94.154:6443/api/v1/namespaces/kube-system?timeout=30s: waiting for cluster agent to connect 2019-10-05 16:50:19.114475 I | mvcc: store.index: compact 75951 2019-10-05 16:50:19.137825 I | mvcc: finished scheduled compaction at 75951 (took 22.527694ms) 2019-10-05 16:55:19.120803 I | mvcc: store.index: compact 76282 2019-10-05 16:55:19.124813 I | mvcc: finished scheduled compaction at 76282 (took 2.746382ms)

After that, I checked logs of the master nodes, I found that the rancher agent still tries to connect to the old rancher server (old ip address), not as the new one, so it makes the cluster not available.

How can I fix this?

score 0 · Answer 1 · answered Aug 09 '20 at 07:34

You need to re-register the node in Rancher using the following steps.

Update the server-url in Rancher by going to Global -> Settings -> server-url This should be the full URL with https://
Then use this script to re-register the node in Rancher https://github.com/mattmattox/cluster-agent-tool

Error restoring Rancher: This cluster is currently Unavailable; areas that interact directly with it will not be available until the API is ready

1 Answers1