0

I try to install Kubernetes Cluster v1.26 (3 nodes - Rocky 9) using kubeadm and I have a problem regarding kubelet. I have followed this tutorial in parallel with official kubernetes cluster installation.

After kubelet installation, the status kubelet is:

systemctl status kubelet

● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Mon 2023-03-27 19:45:05 EEST; 8s ago
       Docs: https://kubernetes.io/docs/
    Process: 17757 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/>
   Main PID: 17757 (code=exited, status=1/FAILURE)
        CPU: 379ms
Mar 27 19:45:05 node01 systemd[1]: kubelet.service: Failed with result 'exit-code'.

In journalctl I have this message:

> Mar 27 19:34:40 node01 kubelet[13832]: E0327 19:34:40.638950   13832 run.go:74] "command failed" err="failed to validate kubelet flags: the container runtime endpoint address was not specified or empty, use --container-runtime-endpoint>
Mar 27 19:34:40 node01 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE

I also added in kubelet service

> Environment="KUBELET_EXTRA_ARGS=--container-runtime-endpoint=unix:///run/containerd/containerd.sock"

kubelet --container-runtime-endpoint=unix:///run/containerd/containerd.sock

I0327 19:52:34.967346   20609 server.go:412] "Kubelet version" kubeletVersion="v1.26.3"
> I0327 19:52:34.967701   20609 server.go:414] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
> I0327 19:52:34.968911   20609 server.go:575] "Standalone mode, no API client"
> I0327 19:52:34.987049   20609 server.go:463] "No api server defined - no events will be sent to API server"
> I0327 19:52:34.987281   20609 server.go:659] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
> I0327 19:52:34.989143   20609 container_manager_linux.go:267] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
> I0327 19:52:34.989474   20609 container_manager_linux.go:272] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] ExperimentalTopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]}
> I0327 19:52:34.989545   20609 topology_manager.go:134] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
> I0327 19:52:34.989583   20609 container_manager_linux.go:308] "Creating device plugin manager"
> I0327 19:52:34.990346   20609 state_mem.go:36] "Initialized new in-memory state store"
> I0327 19:52:35.028186   20609 kubelet.go:404] "Kubelet is running in standalone mode, will skip API server sync"
> I0327 19:52:35.034468   20609 kuberuntime_manager.go:244] "Container runtime initialized" containerRuntime="containerd" version="1.6.19" apiVersion="v1"
> I0327 19:52:35.036405   20609 volume_host.go:75] "KubeClient is nil. Skip initialization of CSIDriverLister"
> W0327 19:52:35.037048   20609 csi_plugin.go:189] kubernetes.io/csi: kubeclient not set, assuming standalone kubelet
> W0327 19:52:35.037060   20609 csi_plugin.go:266] Skipping CSINode initialization, kubelet running in standalone mode
> I0327 19:52:35.039128   20609 server.go:1186] "Started kubelet"
> I0327 19:52:35.040285   20609 kubelet.go:1502] "No API server defined - no node status update will be sent"
> I0327 19:52:35.040699   20609 server.go:161] "Starting to listen" address="0.0.0.0" port=10250
> I0327 19:52:35.049705   20609 server.go:451] "Adding debug handlers to kubelet server"
> I0327 19:52:35.042056   20609 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
> I0327 19:52:35.043346   20609 server.go:193] "Starting to listen read-only" address="0.0.0.0" port=10255
> I0327 19:52:35.058042   20609 volume_manager.go:293] "Starting Kubelet Volume Manager"
> I0327 19:52:35.058088   20609 desired_state_of_world_populator.go:151] "Desired state populator starts to run"
> E0327 19:52:35.066359   20609 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
> E0327 19:52:35.066385   20609 kubelet.go:1386] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
> I0327 19:52:35.157984   20609 cpu_manager.go:214] "Starting CPU manager" policy="none"
> I0327 19:52:35.158169   20609 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
> I0327 19:52:35.158206   20609 state_mem.go:36] "Initialized new in-memory state store"
> I0327 19:52:35.160788   20609 desired_state_of_world_populator.go:159] "Finished populating initial desired state of world"
> I0327 19:52:35.169791   20609 state_mem.go:88] "Updated default CPUSet" cpuSet=""
> I0327 19:52:35.170010   20609 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
> I0327 19:52:35.170034   20609 policy_none.go:49] "None policy: Start"
> I0327 19:52:35.176881   20609 memory_manager.go:169] "Starting memorymanager" policy="None"
> I0327 19:52:35.177054   20609 state_mem.go:35] "Initializing new in-memory state store"
> I0327 19:52:35.180146   20609 state_mem.go:75] "Updated machine memory state"
> I0327 19:52:35.201730   20609 manager.go:455] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
> I0327 19:52:35.203049   20609 plugin_manager.go:118] "Starting Kubelet Plugin Manager"
> I0327 19:52:35.261014   20609 reconciler.go:41] "Reconciler: start to sync state"
> I0327 19:52:35.788856   20609 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv4
> I0327 19:52:36.104318   20609 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv6
> I0327 19:52:36.104388   20609 status_manager.go:172] "Kubernetes client is nil, not starting status manager"
> I0327 19:52:36.104417   20609 kubelet.go:2113] "Starting kubelet main sync loop"
> E0327 19:52:36.105012   20609 kubelet.go:2137] "Skipping pod synchronization" err="PLEG is not healthy: pleg has yet to be successful"

Any ideas? Thank you!

Lucian
  • 11
  • 1
  • 3

2 Answers2

1

I managed to bring my cluster up. First thing that I did was to put --container-runtime-endpoint=unix:///run/containerd/containerd.sock in /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf file but directly in ExecStart.

ExecStart=/usr/bin/kubelet --container-runtime-endpoint=unix:///run/containerd/containerd.sock $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

This made the error disappear but another appeared.

E0409 18:11:13.071952 113674 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory, path: /var/lib/kubelet/config.yaml"

I ignored this one and a go forward to kubeadm init. Kubeadm init finished successfully, but in kubelet I saw a lot of connection refused. I guess that was because kube-apiserver was restarting. So, I decided to reconfigure containerd with default values containerd config default > /etc/containerd/config.toml and to modify only SystemdCgroup = true. I restarted containerd and kubelet and things started to look ok.

After I applyed CNI and I saw another error (in weave-init container from weave pod)

iptables v1.8.3 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded.

I reloaded the ip_tables kernel module by running the following command:

modprobe ip_tables

Now my cluster is ok, I hope I did not miss any step.

Lucian
  • 11
  • 1
  • 3
0

The problem is the containerd config.

The default config has cri totally disabled, but just enabling it is not enough.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd

This annoying setting is what needs to be fixed

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true
Cine
  • 111
  • 2