0

I setup a kubernetes cluster using this tutorial 2 days ago - https://www.linuxtechi.com/install-kubernetes-on-ubuntu-22-04.

The setup went fine, and I could run kubectl commands, create deployments, etc. However when I login now, 2 days later and try to execute any kubectl command, I get:

# k get nodes
E0227 09:45:08.352822  125806 memcache.go:238] couldn't get current server API group list: Get "https://foo.bar.com:6443/api?timeout=32s": dial tcp w.x.y.z:6443: connect: connection refused
E0227 09:45:08.353636  125806 memcache.go:238] couldn't get current server API group list: Get "https://foo.bar.com:6443/api?timeout=32s": dial tcp w.x.y.z:6443: connect: connection refused
E0227 09:45:08.355251  125806 memcache.go:238] couldn't get current server API group list: Get "https://foo.bar.com:6443/api?timeout=32s": dial tcp w.x.y.z:6443: connect: connection refused
E0227 09:45:08.356948  125806 memcache.go:238] couldn't get current server API group list: Get "https://foo.bar.com:6443/api?timeout=32s": dial tcp w.x.y.z:6443: connect: connection refused
E0227 09:45:08.358446  125806 memcache.go:238] couldn't get current server API group list: Get "https://foo.bar.com:6443/api?timeout=32s": dial tcp w.x.y.z:6443: connect: connection refused
The connection to the server foo.bar.com:6443 was refused - did you specify the right host or port?

Did the kubeconfig expire, or did the api-server creash? How can I check and debug this?

Ufder
  • 527
  • 4
  • 20
  • Did you try exporting the `KUBECONFIG` again? – Sibtain Feb 27 '23 at 18:30
  • I did, did not help! – Ufder Feb 27 '23 at 19:20
  • does `crictl ps -a | grep api` give you anything? If you see the container, check the apiserver logs using `crictl logs ` – Sibtain Feb 27 '23 at 19:33
  • Tried that too, any crictl command gives me: `WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. FATA[0000] listing containers: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory"` – Ufder Feb 27 '23 at 20:17

1 Answers1

0

From your error, we don't see any authentication issue. However, your error message mentions the Kubernetes API server is refusing your connection.

First thing I would check: does foo.bar.com (from your error: foo.bar.com:6443) resolve to your control plane node? No issues with your DNSes? You didn't add some loadbalancer? If yes => check that one first: is service started, are backends healthy, ...?

Then, as suggested by @Sibtain in comments, the next step would be to check the container that runs that service. There should be some kube-apiserver, showing in crictl ps -a. Locate the last-started (first-listed) container matching that name and check its logs. This might help you figure out why it's currently down.

You could also check for the kubelet service (systemctl status kubelet, journalctl -fu kubelet), as it is the one that would be tasked to start and restart the kube-apiserver, kube-controller-manager & kube-scheduler services, should anything happend.

You may check for the etcd service as well. As from kube-apiserver perspective, the only hard dependency is its etcd database. According to one of linuxtechi's screenshot, your etcd should run as a container as well: same as kube-apiserver, crictl ps -a, crictl logs, make sure container is still running.


And answering the title of your post, to generate a new kubeconfig, you may use:

kubeadm kubeconfig user --client-name kubernetes-admin \
   --config=/etc/kubernetes/kubeadm-config.yaml \
   --org system:masters >/path/to/.kube/config

Still you probably don't need this: so far, nothing from your post suggests something could be wrong here.

Feel free to edit your post including more logs & errors, depending what you find checking the above.

SYN
  • 4,476
  • 1
  • 20
  • 22
  • 1
    I found the issue, somehow the swap got turned back on. When I ran swapoff, it started working again :(. No clue what happened. – Ufder Mar 01 '23 at 02:27