I've a EKS setup (v1.16) with 2 ASG: one for compute ("c5.9xlarge") and the other gpu ("p3.2xlarge"). Both are configured as Spot and set with desiredCapacity 0.
K8S CA works as expected and scale out each ASG when necessary, the issue is that the newly created gpu instance is not recognized by the master and running kubectl get nodes
emits nothing.
I can see that the ec2 instance was in Running state and also I could ssh the machine.
I double checked the the labels and tags and compared them to the "compute". Both are configured almost similarly, the only difference is that the gpu nodegroup has few additional tags.
Since I'm using eksctl tool (v.0.35.0) and the compute nodeGroup vs. gpu nodeGroup is basically copy&paste, I can't figured out what could be the problem.
UPDATE: ssh the instance I could see the following error (/var/log/messages)
failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
and the kubelet service crashed.
would it possible the my GPU uses wrong AMI (amazon-eks-gpu-node-1.18-v20201211)?