1

I have been trying to deploy a GPU service on Cloud Run for Anthos. However, the GPU nodes created by the node pool contain a taint that does not allow service pods to be scheduled on those nodes. I also tried to add toleration to the YAML config of the service, but the Cloud console does not allow it, and adding via kubectl removes toleration. Can anyone help me out with this?

Thanks

Nauman Mustafa
  • 103
  • 2
  • 9
  • Also, add the command and YAMLs you've used, and what was the error message when you tried the `kubectl` command. – Kamol Hasan Jan 06 '22 at 04:20
  • I am using the google cloud console interface to deploy the service. Here is the YAML generated by the cloud console: https://0bin.net/paste/QD7IcdsK#OHQGigccD29yE-uDLc69nIdAZkZ8fzKsMIjXaS0iMxV after adding toleration to the command I used simple `kubectl apply -f svc.yaml` which did work however I get this error on cloud console: `4 Insufficient cpu, 4 Insufficient memory, 4 Insufficient nvidia.com/gpu..` – Nauman Mustafa Jan 06 '22 at 04:51
  • Need more details to provide a solution. Are you able to get logs from Pod in Logs Explorer? Can you check the pod(nvidia.com/gpu) and share the event. Also check the quota limit and let me know. – Bakul Mitra Jan 06 '22 at 10:19
  • Have you looked at this doc https://cloud.google.com/anthos/run/docs/configuring/compute-power-gpu? – boredabdel Jan 06 '22 at 14:41

0 Answers0