0

I am attempting to deploy CockroachDB:v2.1.6 to a new AWS EKS cluster. Everything is deployed successfully; statefulset, services, pv's & pvc's are created. The AWS EBS volumes are created successfully too.

The issue is the pods never get to a READY state.

pod/cockroachdb-0   0/1     Running   0          14m
pod/cockroachdb-1   0/1     Running   0          14m
pod/cockroachdb-2   0/1     Running   0          14m

If I 'describe' the pods I get the following:

  Normal   Pulled                  46s                kubelet, ip-10-5-109-70.eu-central-1.compute.internal  Container image "cockroachdb/cockroach:v2.1.6" already present on machine
  Normal   Created                 46s                kubelet, ip-10-5-109-70.eu-central-1.compute.internal  Created container cockroachdb
  Normal   Started                 46s                kubelet, ip-10-5-109-70.eu-central-1.compute.internal  Started container cockroachdb
  Warning  Unhealthy               1s (x8 over 36s)   kubelet, ip-10-5-109-70.eu-central-1.compute.internal  Readiness probe failed: HTTP probe failed with statuscode: 503

If I examine the logs of a pod I see this:

I200409 11:45:18.073666 14 server/server.go:1403  [n?] no stores bootstrapped and --join flag specified, awaiting init command.
W200409 11:45:18.076826 87 vendor/google.golang.org/grpc/clientconn.go:1293  grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb:26257 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host". Reconnecting...
W200409 11:45:18.076942 21 gossip/client.go:123  [n?] failed to start gossip client to cockroachdb-0.cockroachdb:26257: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb on 172.20.0.10:53: no such host"

I came across this comment from the CockroachDB forum (https://forum.cockroachlabs.com/t/http-probe-failed-with-statuscode-503/2043/6)

Both the cockroach_out.log and cockroach_output1.log files you sent me (corresponding to mycockroach-cockroachdb-0 and mycockroach-cockroachdb-2) print out no stores bootstrapped during startup and prefix all their log lines with n?, indicating that they haven’t been allocated a node ID. I’d say that they may have never been properly initialized as part of the cluster.

I have deleted everything including pv's, pvc's & AWS EBS volumes through the kubectl delete command and reapplied with the same issue.

Any thoughts would be very much appreciated. Thank you

Koman
  • 187
  • 1
  • 1
  • 12

1 Answers1

-1

I was not aware that you had to initialize the CockroachDB cluster after creating it. I did the following to resolve my issue:

kubectl exec -it cockroachdb-0 -n /bin/sh

/cockroach/cockroach init

See here for more details - https://www.cockroachlabs.com/docs/v19.2/cockroach-init.html

After this the pods started running correctly.

Koman
  • 187
  • 1
  • 1
  • 12