kubernetes liveness probe failed but manual probe succeeded

Question

I have set up a liveness probe for a long running application in a pod. It failed a few times within a day causing the pod to be restarted a few times. There is no readiness probe.

livenessProbe:
  httpGet:
    path: /
    port: http
    scheme: HTTP
  initialDelaySeconds: 30
  timeoutSeconds: 20
  periodSeconds: 20
  successThreshold: 1
  failureThreshold: 3

Further checking of the application code or docker image revealed nothing unusual. So I disabled the liveness probe, and manually probed the NodePort service every 10 secs using a python script from a PC connected to the network. The manual probe, though more frequent and more stringent than the liveness probe succeeded without failure. Each ping lasted about 200~400ms

The manual probe is about the same as a liveness probe of settings

timeoutSeconds: 500ms
periodSeconds: 10
successThreshold: 1
failureThreshold: 1

Why did it succeed while the liveness probe has failed? Does it indicate a k8s networking issue?

pod manifest:

kind: Pod
apiVersion: v1
metadata:
  name: pypi-pypiserver-74b689df7-rh9bm
  namespace: default
  labels:
    app.kubernetes.io/instance: pypi
    app.kubernetes.io/name: pypiserver
spec:
  volumes:
    - name: secrets
      secret:
        secretName: pypi-pypiserver
        defaultMode: 420
    - name: packages
      persistentVolumeClaim:
        claimName: pypi-pypiserver
    - name: default-token-cx7m7
      secret:
        secretName: default-token-cx7m7
        defaultMode: 420
  containers:
    - name: pypiserver
      image: 'registry.digitalocean.com/evergreen/pypiserver:latest'
      args:
        - run
        - '--passwords=.'
        - '--authenticate=.'
        - '--port=8080'
        - '--welcome=/dev/null'
        - '--server=wsgiref'
        - /data/packages
      ports:
        - name: http
          containerPort: 8080
          protocol: TCP
      resources:
        limits:
          cpu: 1600m
          memory: 1Gi
        requests:
          cpu: 400m
          memory: 256Mi
      volumeMounts:
        - name: packages
          mountPath: /data/packages
          mountPropagation: None
        - name: secrets
          readOnly: true
          mountPath: /config
        - name: default-token-cx7m7
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      livenessProbe:
        httpGet:
          path: /
          port: http
          scheme: HTTP
        initialDelaySeconds: 30
        timeoutSeconds: 10
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
  restartPolicy: Always
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  nodeSelector:
    doks.digitalocean.com/node-pool: k8s-node-pool-hive-dev-2
  serviceAccountName: default
  serviceAccount: default
  nodeName: k8s-node-pool-hive-dev-2-8adyc
  securityContext:
    runAsUser: 9898
    runAsGroup: 9898
    fsGroup: 9898
  imagePullSecrets:
    - name: evergreen
  schedulerName: default-scheduler
  tolerations:
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
  priority: 0
  enableServiceLinks: true
  preemptionPolicy: PreemptLowerPriority

Try running the same script inside a pod, preferably the same pod that has failing liveness probe. Check if you have the same outcome. Networking shouldn'y be a problem since the pod basically pings itself. — , Jul 05 '21 at 08:35
@PawełGrondal what does it mean if the self-ping inside the pod fails and the node port ping succeeds? — kakarukeys, Jul 05 '21 at 15:29
What's the exact log of probe failed? Does the pod have a port named "http", could you paste the pod yaml here? — Addo Zhang, Jul 05 '21 at 22:55
the usual one: liveness probe failed context deadline exceeded, you must have seen this before a thousand times. edited Q, added pod yaml — kakarukeys, Jul 05 '21 at 23:40
`.spec.containers.ports.protocol` is TCP, but `.spec.containers.livenessProbe.httpGet.scheme` is HTTP. Are you sure this is correct? — , Jul 07 '21 at 05:59
@PawełGrondal it is an HTTP API server. http is a tcp protocol. — kakarukeys, Jul 07 '21 at 08:53

score 0 · Answer 1 · answered Jul 04 '21 at 09:39

0

NodePort probe just confirms that svc is available at this port. It will not check whether pod is live or not. Check livenessprobe for pod container availability.

More details here https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

answered Jul 04 '21 at 09:39

subudear

231
1
4

I don't think so, my script calls the HTTP API of the application, if the pod is dead, the API becomes unavailable – kakarukeys Jul 04 '21 at 12:00

kubernetes liveness probe failed but manual probe succeeded

1 Answers1