12

I have a problem with one of the pods. It says that it is in a pending state.

If I describe the pod, this is what I can see:

Events:
  Type     Reason             Age                From                Message
  ----     ------             ----               ----                -------
  Normal   NotTriggerScaleUp  1m (x58 over 11m)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) didn't match node selector
  Warning  FailedScheduling   1m (x34 over 11m)  default-scheduler   0/6 nodes are available: 6 node(s) didn't match node selector. 

If I check the logs, there is nothing in there (it just outputs empty value).

--- Update --- This is my pod yaml file

apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum/config: XXXXXXXXXXX
    checksum/dashboards-config: XXXXXXXXXXX
  creationTimestamp: 2020-02-11T10:15:15Z
  generateName: grafana-654667db5b-
  labels:
    app: grafana-grafana
    component: grafana
    pod-template-hash: "2102238616"
    release: grafana
  name: grafana-654667db5b-tnrlq
  namespace: monitoring
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: grafana-654667db5b
    uid: xxxx-xxxxx-xxxxxxxx-xxxxxxxx
  resourceVersion: "98843547"
  selfLink: /api/v1/namespaces/monitoring/pods/grafana-654667db5b-tnrlq
  uid: xxxx-xxxxx-xxxxxxxx-xxxxxxxx
spec:
  containers:
  - env:
    - name: GF_SECURITY_ADMIN_USER
      valueFrom:
        secretKeyRef:
          key: xxxx
          name: grafana
    - name: GF_SECURITY_ADMIN_PASSWORD
      valueFrom:
        secretKeyRef:
          key: xxxx
          name: grafana
    - name: GF_INSTALL_PLUGINS
      valueFrom:
        configMapKeyRef:
          key: grafana-install-plugins
          name: grafana-config
    image: grafana/grafana:5.0.4
    imagePullPolicy: Always
    name: grafana
    ports:
    - containerPort: 3000
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /api/health
        port: 3000
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 30
    resources:
      requests:
        cpu: 200m
        memory: 100Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/grafana
      name: config-volume
    - mountPath: /var/lib/grafana/dashboards
      name: dashboard-volume
    - mountPath: /var/lib/grafana
      name: storage-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-tqb6j
      readOnly: true
  dnsPolicy: ClusterFirst
  initContainers:
  - command:
    - sh
    - -c
    - cp /tmp/config-volume-configmap/* /tmp/config-volume 2>/dev/null || true; cp
      /tmp/dashboard-volume-configmap/* /tmp/dashboard-volume 2>/dev/null || true
    image: busybox
    imagePullPolicy: Always
    name: copy-configs
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /tmp/config-volume-configmap
      name: config-volume-configmap
    - mountPath: /tmp/dashboard-volume-configmap
      name: dashboard-volume-configmap
    - mountPath: /tmp/config-volume
      name: config-volume
    - mountPath: /tmp/dashboard-volume
      name: dashboard-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-tqb6j
      readOnly: true
  nodeSelector:
    nodePool: cluster
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 300
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: config-volume
  - emptyDir: {}
    name: dashboard-volume
  - configMap:
      defaultMode: 420
      name: grafana-config
    name: config-volume-configmap
  - configMap:
      defaultMode: 420
      name: grafana-dashs
    name: dashboard-volume-configmap
  - name: storage-volume
    persistentVolumeClaim:
      claimName: grafana
  - name: default-token-tqb6j
    secret:
      defaultMode: 420
      secretName: default-token-tqb6j
status:
  conditions:
  - lastProbeTime: 2020-02-11T10:45:37Z
    lastTransitionTime: 2020-02-11T10:15:15Z
    message: '0/6 nodes are available: 6 node(s) didn''t match node selector.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable

Do you know how should I further debug this?

David Maze
  • 130,717
  • 29
  • 175
  • 215
Bob
  • 8,392
  • 12
  • 55
  • 96
  • @DT. I have updated my question with pod yaml file – Bob Feb 11 '20 at 10:48
  • can you remove these lines `nodeSelector: nodePool: cluster` and start your pod creation again from scratch .. or ensure you add this as label to all your nodes `nodePool: cluster` so the pod will be scheduled as it is still in pending state. – DT. Feb 11 '20 at 10:51
  • you can use this command to lable all nodes `kubectl label nodes nodePool=cluster` run this by replacing node name on your cluster for each node or nodes you want to select with this label – DT. Feb 11 '20 at 10:53
  • This helped! Thanks! I am now experiencing some other configuration issue, but that is unrelated to this. Thanks a lot! – Bob Feb 11 '20 at 11:00
  • @DT., please post an answer with your solution so it can help others. – Mark Watney Feb 11 '20 at 11:12
  • 1
    @mWatney updated my comments as answer below – DT. Feb 11 '20 at 11:17

3 Answers3

33

Solution : You can do one of the two things to allow scheduler to fullfil your pod creation request.

  1. you can choose to remove these lines from your pod yaml and start your pod creation again from scratch (if you need a selector for a reason go for approach as on next step 2)

    nodeSelector: 
        nodePool: cluster 
    

or

  1. You can ensure that you add this nodePool: cluster as label to all your nodes so the pod will be scheduled by using the available selector.

You can use this command to label all nodes

kubectl label nodes <your node name> nodePool=cluster

Run above command by replacing node name from your cluster details for each node or only the nodes you want to be select with this label.

Valdis
  • 3,170
  • 2
  • 18
  • 24
DT.
  • 3,351
  • 2
  • 18
  • 32
1

Your pod probably uses a node selector which can not fulfilled by scheduler. Check pod description for something like that

    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
      labels:
        env: test
    spec:
      ...
      nodeSelector:
        disktype: ssd

And check whether your nodes are labeled accordingly.

abinet
  • 2,552
  • 1
  • 15
  • 24
0

The simplest option would be to use "nodeName" in the Pod yaml.

First, get the node where you want to run the Pod:

kubectl get nodes

Use the below attribute inside the Pod definition( yaml) so that the Pod is forced to run under the below mentioned node only.

nodeName: seliiuvd05714
Deb
  • 587
  • 4
  • 12