1

While using Anthos Config Management of GCP, i am repeatedly encounter errors that admission-webhook pods are dead in OOMKilled status. So I tried to manage memory request of pod spec, it seemed to work for a while but because all the fields of objects in config-management-system namespace are managed by a controller (ManagedFields), I couldn't change the spec of the admission-webhook permanently. I mean deployment spec is being reconciled to original spec.

Could anyone help me?

  • Can I updated some part of managed fields by force? Will it be okay practically? Because I don't want to hack into Google provided pods just for updating resources spec of Pods.
  • Even though I update managed field by force (kubectl apply -f .. --force-conflicts --server-side), original spec is restored by other manager. Is there any way to deal with current situation?

Pods status.

$ kubectl get pods -n config-management-system -o wide
NAME                                 READY   STATUS             RESTARTS   AGE     IP           NODE                                     NOMINATED NODE   READINESS GATES
admission-webhook-76b67c9f8c-vt8n6   0/1     CrashLoopBackOff   6          10m     10.40.0.76   gke-seoul-a-default-pool-e2dcfaa0-8tlw   <none>           <none>
admission-webhook-76b67c9f8c-wdlzg   0/1     CrashLoopBackOff   6          10m     10.40.1.95   gke-seoul-a-default-pool-e2dcfaa0-z2l0   <none>           <none>
reconciler-manager-7f95dbf7-ss5z2    2/2     Running            0          6h57m   10.40.1.88   gke-seoul-a-default-pool-e2dcfaa0-z2l0   <none>           <none>
root-reconciler-5ddb78479c-lmkmk     3/3     Running            0          6h53m   10.40.1.89   gke-seoul-a-default-pool-e2dcfaa0-z2l0   <none>           <none>

And logs of dead pod.

kubectl logs -f admission-webhook-76b67c9f8c-vt8n6 -n config-management-system --previous
I0824 08:32:16.821366       1 setup.go:15] Build Version:
I0824 08:32:16.821436       1 deleg.go:130] setup "level"=0 "msg"="starting manager"
I0824 08:32:17.922013       1 request.go:655] Throttling request took 1.007744493s, request: GET:https://10.44.0.1:443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0824 08:32:21.132165       1 deleg.go:130] controller-runtime/metrics "level"=0 "msg"="metrics server is starting to listen"  "addr"=":8080"
I0824 08:32:21.132367       1 deleg.go:130] setup "level"=0 "msg"="creating certificate rotator for webhook"
I0824 08:32:21.132498       1 deleg.go:130] setup "level"=0 "msg"="starting manager"
I0824 08:32:21.132660       1 deleg.go:130] setup "level"=0 "msg"="waiting for certificate rotator"
I0824 08:32:21.132845       1 internal.go:385] controller-runtime/manager "level"=0 "msg"="starting metrics server"  "path"="/metrics"
I0824 08:32:21.132937       1 controller.go:165] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting EventSource"  "source"={}
I0824 08:32:21.133137       1 controller.go:165] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting EventSource"  "source"={}
I0824 08:32:21.133295       1 deleg.go:130] cert-rotation "level"=0 "msg"="starting cert rotator controller"
I0824 08:32:21.233606       1 controller.go:173] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting Controller"
I0824 08:32:21.233650       1 controller.go:211] controller-runtime/manager/controller/cert-rotator "level"=0 "msg"="Starting workers"  "worker count"=1
I0824 08:32:21.234167       1 rotator.go:665] cert-rotation "level"=0 "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io"
I0824 08:32:21.234564       1 deleg.go:130] cert-rotation "level"=0 "msg"="no cert refresh needed"
I0824 08:32:21.234611       1 deleg.go:130] cert-rotation "level"=0 "msg"="certs are ready in /certs"
I0824 08:32:21.239218       1 rotator.go:665] cert-rotation "level"=0 "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"ValidatingWebhookConfiguration"} "name"="admission-webhook.configsync.gke.io"
I0824 08:32:22.659789       1 deleg.go:130] cert-rotation "level"=0 "msg"="CA certs are injected to webhooks"
I0824 08:32:22.660065       1 deleg.go:130] setup "level"=0 "msg"="registering validating webhook"

Description.

Name:         admission-webhook-76b67c9f8c-vt8n6
Namespace:    config-management-system
Priority:     0
Node:         gke-seoul-a-default-pool-e2dcfaa0-8tlw/10.178.0.6
Start Time:   Tue, 24 Aug 2021 17:25:44 +0900
Labels:       app=admission-webhook
              pod-template-hash=76b67c9f8c
Annotations:  <none>
Status:       Running
IP:           10.40.0.76
IPs:
  IP:           10.40.0.76
Controlled By:  ReplicaSet/admission-webhook-76b67c9f8c
Containers:
  admission-webhook:
    Container ID:  containerd://95084e16bedb0f75ed18c24b2b16a815dda0fef87bbaea00df606b9093cca197
    Image:         gcr.io/config-management-release/admission-webhook:v1.8.1-rc.2
    Image ID:      gcr.io/config-management-release/admission-webhook@sha256:89783f083940d75cc4b7c51428966751a29e7edd606872b23be090a0a1655ecc
    Port:          10250/TCP
    Host Port:     0/TCP
    Command:
      /admission-webhook
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Fri, 27 Aug 2021 20:09:53 +0900
      Finished:     Fri, 27 Aug 2021 20:10:04 +0900
    Ready:          False
    Restart Count:  854
    Limits:
      cpu:     200m
      memory:  100Mi
    Requests:
      cpu:        100m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from admission-webhook-token-gt7qz (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  admission-webhook-cert
    Optional:    false
  admission-webhook-token-gt7qz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  admission-webhook-token-gt7qz
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                       From     Message
  ----     ------   ----                      ----     -------
  Warning  BackOff  3m25s (x19992 over 3d2h)  kubelet  Back-off restarting failed container

Info.

  • ACM Version 1.8.1
  • Unstructured, based on Github with Token.
  • Github repository is for testing purpose only with a tiny configmap; synced well, but admission webhook cannot stop me from deleting configmap with kubectl even though it restored well right after the deletion.
  • Reinstalled few times from the scratch.
dehypnosis
  • 11
  • 2

0 Answers0