0

Problem: For some reason, helm release of kube-prometheus-stack is stuck in Pending-install status. What is the correct to install a helm release for this using helm cli?

Details:

Due to Docker registry k8s.gcr.io getting frozen, I had to update the Docker image registry to registry.k8s.io for kube-state-metrics by updating the values.yaml as follows:

kube-state-metrics:
  prometheusScrape: true
  image:
    repository: registry.k8s.io/kube-state-metrics/kube-state-metrics
    tag: v1.9.8
    pullPolicy: Always
  namespaceOverride: ""
  rbac:
    create: true
  podSecurityPolicy:
    enabled: true

After that, when I tried update the helm release for kube-prometheus-stack using same version of 14.9.0, it failed with status Failed for helm release. Upon retrying, it deleted the previous helm release and created a new one. All the components by the new one created successfully but the helm release got stuck in the Pending-install status.

I waited for almost 30 minutes but no success. I also tried deleting helm release, rollbacking helm release, deleting helm release secret but got no success.

What could be the issue? How can I solve it?

Abdullah Khawer
  • 4,461
  • 4
  • 29
  • 66

1 Answers1

2

Solution: After some investigation, I found that there was a job named kube-prometheus-stack-admission-patch which was failing with BackoffLimitExceeded error. It was some kind of an initializing job. Deleting the job (not pod) fixed the issue and the helm release changed its status to Deployed.

Error Log in kube-prometheus-stack-admission-patch job:

W0331 10:58:03.079451       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
{"level":"info","msg":"patching webhook configurations 'kube-prometheus-stack-admission' mutating=true, validating=true, failurePolicy=Fail","source":"k8s/k8s.go:39","time":"2023-03-31T10:58:03Z"}
{"err":"the server could not find the requested resource","level":"fatal","msg":"failed getting validating webhook","source":"k8s/k8s.go:48","time":"2023-03-31T10:58:03Z"}
Abdullah Khawer
  • 4,461
  • 4
  • 29
  • 66
  • 1
    This is probably you are using older version of `kube-webhook-certgen` or `kube-prometheus-stack`. `kube-webhook-certgen` < 1.3.0 is not compatible with k8s clusters > 1.22 . If you are using cluster version 1.23 or grater, you may need to upgrade kube-prometheus-stack version >1.40 or use `kube-webhook-certgen` v1.3.0 or grater – pgollangi Apr 05 '23 at 08:15
  • hi @pgollangi, I have the similar issue after using the latest version of kube-prometheus-stack (helm version: 48.3.1) and I have checked the version of kube-webhook-certgen is v20221220-controller-v1.5.1-58-g787ea74b6, which I think is v1.3.0 or greater. I've got the following error: `failed to do request: Head "https://us-west1-docker.pkg.dev/v2/k8s-artifacts-prod/images/ingress-nginx/kube-webhook-certgen/manifests/v20221220-controller-v1.5.1-58-g787ea74b6": dial tcp: lookup us-west1-docker.pkg.dev on 10.0.0.2:53: no such host ` Do you have any ideas? – Andrew Aug 11 '23 at 09:48