2

I had to force delete a pod because it was stuck terminating for multiple days. The pod now won't come back.

-> % k -n kube-system rollout status deployment coredns
Waiting for deployment "coredns" rollout to finish: 0 out of 1 new replicas have been updated...

The status command hangs here until I cancel it.

-> % k get pods -n kube-system
NAME                                              READY   STATUS    RESTARTS   AGE
dashboard-metrics-scraper-db65b9c6f-xx75z         1/1     Running   7          93d
heapster-v1.5.2-58fdbb6f4d-h528c                  4/4     Running   26         60d
hostpath-provisioner-75fdc8fccd-2k966             1/1     Running   7          82d
kubernetes-dashboard-67765b55f5-9g85m             1/1     Running   130        93d
monitoring-influxdb-grafana-v4-6dc675bf8c-xlzlj   2/2     Running   22         60d

There is no coredns pod being started.

How do I rollout a new pod to fix this issue?

EDIT: Here is the description of the deployment:

-> % k describe -n kube-system deployments.apps coredns
Name:                   coredns
Namespace:              kube-system
CreationTimestamp:      Sun, 26 Apr 2020 12:26:40 +0100
Labels:                 addonmanager.kubernetes.io/mode=Reconcile
                        k8s-app=kube-dns
                        kubernetes.io/cluster-service=true
                        kubernetes.io/name=CoreDNS
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               k8s-app=kube-dns
Replicas:               1 desired | 0 updated | 0 total | 0 available | 2 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  0 max unavailable, 10% max surge
Pod Template:
  Labels:           k8s-app=kube-dns
  Annotations:      kubectl.kubernetes.io/restartedAt: 2020-07-29T10:27:32+01:00
                    scheduler.alpha.kubernetes.io/critical-pod:
  Service Account:  coredns
  Containers:
   coredns:
    Image:       coredns/coredns:1.6.6
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:               ConfigMap (a volume populated by a ConfigMap)
    Name:               coredns
    Optional:           false
  Priority Class Name:  system-cluster-critical
Conditions:
  Type             Status  Reason
  ----             ------  ------
  Available        False   MinimumReplicasUnavailable
  ReplicaFailure   True    FailedCreate
  Progressing      False   ProgressDeadlineExceeded
OldReplicaSets:    coredns-588fd544bf (0/1 replicas created)
NewReplicaSet:     coredns-785764658b (0/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  36m   deployment-controller  Scaled up replica set coredns-785764658b to 1

I also have 2 replica sets for coredns first one fails with:

 Error creating: pods "coredns-785764658b-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.volumes[0]: Invalid value: "configMap": configMap volumes are not allowed to be used spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added]

Second one:

Warning  FailedCreate  2m52s (x11 over 19h)  replicaset-controller  Error creating: pods "coredns-588fd544bf-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.volumes[0]: Invalid value: "configMap": configMap volumes are not allowed to be used spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added]

More information:

-> % kubectl get rs -n kube-system
NAME                                        DESIRED   CURRENT   READY   AGE
coredns-588fd544bf                          1         0         0       94d
coredns-785764658b                          1         0         0       23h
dashboard-metrics-scraper-db65b9c6f         1         1         1       94d
heapster-v1.5.2-58fdbb6f4d                  1         1         1       94d
hostpath-provisioner-75fdc8fccd             1         1         1       83d
kubernetes-dashboard-67765b55f5             1         1         1       94d
monitoring-influxdb-grafana-v4-6dc675bf8c   1         1         1       94d
-> % kubectl rollout history deployment coredns -n kube-system
deployment.apps/coredns
REVISION  CHANGE-CAUSE
1         <none>
2         <none>
-> % kubectl get deploy coredns -n kube-system -o yaml | grep progressDeadlineSeconds
        f:progressDeadlineSeconds: {}
  progressDeadlineSeconds: 600
digital
  • 2,079
  • 3
  • 25
  • 35
  • Try find the Deployment+ReplicaSet or DaemonSet responsible for CoreDNS and `kubectl describe` it please. – Serge Jul 29 '20 at 09:42
  • have you mentioned replica number in deployment yaml ? – Pradeep Saini Jul 29 '20 at 09:45
  • What is k8s version?. Please provide the [output for](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#checking-rollout-history-of-a-deployment): `kubectl get rs -n kube-system`, `kubectl rollout history deployment coredns -n kube-system` it would be nice to verify in `kubectl get deploy coredns -n kube-system -o yaml` `.spec.progressDeadlineSeconds`. Did you enable any authorization using [PodSecurityPolicy](https://kubernetes.io/docs/concepts/policy/pod-security-policy/)? – Mark Jul 29 '20 at 13:05
  • @Hanx I've added the extra information. – digital Jul 30 '20 at 09:01
  • Did you try [Rolling Back](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-to-a-previous-revision): **1**. `kubectl rollout undo ...`, `kubectl rollout undo ... --to-revision=2`, **2**. Enable and disable coredns addon?, **3**. Has anything chganged in the meantime?, **4**. What about the rest questions PodSecurityPolicy?, **5**. Is this default Microk8s installation/configuration? – Mark Jul 30 '20 at 11:18
  • I've rolled back but still no luck. Enabling and disabling doesn't work. I've not done anything with the cluster since I posted this question. There are no additional PodSecurityPolicy. Yeah it's microk8s, 1 master 3 nodes on PVE with metallb. – digital Jul 30 '20 at 11:27
  • Have you did anything before that happened (not after posting the question). In this case my advice is to go through [Troubleshooting section](https://microk8s.io/docs/troubleshooting) f.e. `sudo microk8s inspect`, Common issues: My dns and dashboard pods are CrashLooping. Please try and re-enable [disable/enable] dns instead of `Enabling and disabling doesn't work`. Eventually please report a [bug](https://github.com/ubuntu/microk8s/issues) – Mark Jul 31 '20 at 11:56

0 Answers0