I have a Job Object which shall use a Node Selector to only use Nodes, which have a GPU under the hood. I know to to set it (it gets converted from a string in a python program).
job = f"""
apiVersion: batch/v1
kind: Job
....
nodeSelector:
sma-gpu-size: {gpu_size}
"""
Our ops team sets this selectors in the next few weeks, but currently when setting the node selector, the service is not able to start.
2022-09-20T07:20:24Z [Warning] 0/35 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 30 node(s) didn't match Pod's node affinity/selector.
Is it somehow possible to use these node_selectors only if they are available, something like this (pseudo yaml)?
job = f"""
apiVersion: batch/v1
kind: Job
....
nodeSelector:
if_available:
sma-gpu-size: {gpu_size}
else:
Any
"""