-1

I get the following error Built on jetson Xavier AGX with kubespray.

helm install --wait --generate-name nvidia/gpu-operator

Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "nodefeaturerules.nfd.k8s-sigs.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "gpu-operator-1661139963": current value is "gpu-operator-1661134243"; annotation validation error: key "meta.helm.sh/release-namespace" must equal "default": current value is "gpu-operator"
tuioku
  • 119
  • 10

1 Answers1

1

Try using helm list --all --all-namespaces and if you get any resources try to uninstall them by using following command

helm uninstall <release-name> -n <namespace> --no-hooks

To deploy the GPU Operator using helm.

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
   && chmod 700 get_helm.sh \
   && ./get_helm.sh

Now, add the NVIDIA Helm repository:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

This will install the operator in the default namespace while all operands were installed in the gpu-operator-resources namespace.

And the command you mentioned <helm install --wait --generate-name nvidia/gpu-operator> is for getting both the operator and operands get installed in the same namespace

Example :

To install the GPU Operator in the gpu-operator namespace:

helm install --wait --generate-name \
     -n gpu-operator --create-namespace \
     nvidia/gpu-operator

So create a suitable namespace as per your case

For reference follow Install NVIDIA GPU Operator.

Sai Chandra Gadde
  • 2,242
  • 1
  • 3
  • 15