Tensorflow: dynamically call GPUs with enough free memory

Question

My desktop has two gpus which can run Tensorflow with specification /gpu:0 or /gpu:1. However, if I don't specify which gpu to run the code, Tensorflow will by default to call /gpu:0, as we all know.

Now I would like to setup the system such that it can assign gpu dynamically according to the free memory of each gpu. For example, if a script doesn't specify which gpu to run the code, the system first assigns /gpu:0 for it; then if another script runs now, it will check whether /gpu:0 has enough free memory. If yes, it will continue assign /gpu:0 to it, otherwise it will assign /gpu:1 to it. How can I achieve it?

Follow-ups: I believe the question above may be related to the virtualization problem of GPU. That is to say, if I can virtualize multi-gpu in a desktop into one GPU, I can get what I want. So beside any setup methods for Tensorflow, any ideas about virtualization is also welcome.

TensorFlow generally assumes it's not sharing GPU with anyone, so I don't see a way of doing it from inside TensorFlow. However, you could do it from outside as follows -- shell script that calls nvidia-smi, parses out GPU k with more memory, then sets "CUDA_VISIBLE_DEVICES=k" and calls TensorFlow script — Yaroslav Bulatov, Nov 30 '16 at 19:27

score 0 · Answer 1 · answered Nov 18 '17 at 19:43

TensorFlow generally assumes it's not sharing GPU with anyone, so I don't see a way of doing it from inside TensorFlow. However, you could do it from outside as follows -- shell script that calls nvidia-smi, parses out GPU k with more memory, then sets "CUDA_VISIBLE_DEVICES=k" and calls TensorFlow script

score 0 · Answer 2 · answered Dec 18 '20 at 03:29

Inspired by: How to set specific gpu in tensorflow?

def leave_gpu_with_most_free_ram():
  try:
    command = "nvidia-smi --query-gpu=memory.free --format=csv"
    memory_free_info = _output_to_list(sp.check_output(command.split()))[1:]
    memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
    least_busy_idx = memory_free_values.index(max(memory_free_values))

    # update CUDA variable
    gpus =[least_busy_idx]
    setting = ','.join(map(str, gpus))
    os.environ["CUDA_VISIBLE_DEVICES"] = setting
    print('Left next %d GPU(s) unmasked: [%s] (from %s available)'
          % (leave_unmasked, setting, str(available_gpus)))
  except FileNotFoundError as e:
    print('"nvidia-smi" is probably not installed. GPUs are not masked')
    print(e)
  except sp.CalledProcessError as e:
    print("Error on GPU masking:\n", e.output)

Add a call to this function before importing tensorflow

Tensorflow: dynamically call GPUs with enough free memory

2 Answers2