2

When a pod gets stuck in a Waiting state, what can I do to find out why it's Waiting?

For instance, I have a deployment to AKS which uses ACI.

When I deploy the yaml file, a number of the pods will be stuck in a Waiting state. Running kubectl describe pod selenium121157nodechrome-7bf598579f-kqfqs returns;

State:          Waiting
  Reason:       Waiting
Ready:          False
Restart Count:  0

kubectl logs selenium121157nodechrome-7bf598579f-kqfqs returns nothing.

How can I find out what is the pod Waiting for?

Here's the yaml deployment;

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aci-helloworld2
spec:
  replicas: 20
  selector:
    matchLabels:
      app: aci-helloworld2
  template:
    metadata:
      labels:
        app: aci-helloworld2
    spec:
      containers:
      - name: aci-helloworld
        image: microsoft/aci-helloworld
        ports:
        - containerPort: 80
      nodeSelector:
        kubernetes.io/role: agent
        beta.kubernetes.io/os: linux
        type: virtual-kubelet
      tolerations:
      - key: virtual-kubelet.io/provider
        operator: Exists
      - key: azure.com/aci
        effect: NoSchedule

Here's the output from a describe pod that's been Waiting for 5 minutes;

matt@Azure:~/2020$ kubectl describe pod aci-helloworld2-86b8d7866d-b9hgc
Name:           aci-helloworld2-86b8d7866d-b9hgc
Namespace:      default
Priority:       0
Node:           virtual-node-aci-linux/
Labels:         app=aci-helloworld2
                pod-template-hash=86b8d7866d
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/aci-helloworld2-86b8d7866d
Containers:
  aci-helloworld:
    Container ID:   aci://95919def19c28c2a51a806928030d84df4bc6b60656d026d19d0fd5e26e3cd86
    Image:          microsoft/aci-helloworld
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       Waiting
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hqrj8 (ro)
Volumes:
  default-token-hqrj8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hqrj8
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/os=linux
                 kubernetes.io/role=agent
                 type=virtual-kubelet
Tolerations:     azure.com/aci:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
                 virtual-kubelet.io/provider
Events:
  Type    Reason     Age        From               Message
  ----    ------     ----       ----               -------
  Normal  Scheduled  <unknown>  default-scheduler  Successfully assigned default/aci-helloworld2-86b8d7866d-b9hgc to virtual-node-aci-linux
Matt
  • 41
  • 1
  • 3
  • 1
    try describing the deployment by using `kubectl describe deployment -n ` – Kamol Hasan Mar 21 '20 at 10:26
  • 1
    there would be events if you do the describe, it would tell you what its waiting for (the very bottom of the output) – 4c74356b41 Mar 21 '20 at 11:23
  • Could you please add the events here `kubectl get events --sort-by='.lastTimestamp'` – Tummala Dhanvi Mar 21 '20 at 11:41
  • 1
    share pod deployment yaml – Arghya Sadhu Mar 21 '20 at 17:37
  • deployment yaml is this hello world example from Microsoft. `apiVersion: apps/v1 kind: Deployment metadata: name: aci-helloworld2 spec: replicas: 20 selector: matchLabels: app: aci-helloworld2 template: metadata: labels: app: aci-helloworld2 spec: containers: - name: aci-helloworld image: microsoft/aci-helloworld ports: - containerPort: 80 nodeSelector: kubernetes.io/role: agent beta.kubernetes.io/os: linux type: virtual-kubelet tolerations: - key: virtual-kubelet.io/provider operator: Exists - key: azure.com/aci effect: NoSchedule` – Matt Mar 21 '20 at 20:25
  • 1
    Hi Matt, welcome to SO. Please don't put code snippets in comments -- use [the edit link](https://stackoverflow.com/posts/60786483/edit) under your post to edit your question and include the code block there – mdaniel Mar 21 '20 at 23:19
  • Can you get the status of the deployment? I see you set the replicas with 20, I want to know if there is no one pods running or just some pods running. – Charles Xu Mar 26 '20 at 07:59

1 Answers1

0

Based on the official documentation if your pod is in waiting state it means that it was scheduled on the node but it can't run on that machine with the image pointed out as the most common issue. You can try to run your image manually with docker pull and docker run and rule out the issues with image.

The information from kubectl describe <pod-name> should give you some information, especially the events section down to the bottom. Here`s an example how they can look like:

Events:
 Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  <unknown>            default-scheduler  Successfully assigned default/testpod to cafe
  Normal   BackOff    50s (x6 over 2m16s)  kubelet, cafe      Back-off pulling image "busybox"
  Normal   Pulling    37s (x4 over 2m17s)  kubelet, cafe      Pulling image "busybox"

It could be also issue with your NodeSelector and Tolerations but again that would be shown in your events once you describe your pod.

Let me know if it helps and what are your outputs from describe pod.

acid_fuji
  • 6,287
  • 7
  • 22
  • Thanks. Updated original question above. This happens with a number of different images, also images that are hosted in ACR. – Matt Mar 23 '20 at 09:31
  • This still looks to me like image pulling problem. Does the `Microsoft.ContainerInstance` report as registered? It's marked as something that has to be done before running ACI. There is also a note in this [document](https://learn.microsoft.com/en-us/azure/aks/virtual-nodes-portal) worth checking out. It says that if you wish to use ACR you have to configure it with kubernetes `Secret`. Lastly, does you ACI nodes report as ready (`kubectl get nodes`) ? – acid_fuji Mar 23 '20 at 12:40