When using Accelerate's notebook_launcher to kickoff a training job spawning across multiple GPUs, is there a way to specify which GPUs (i.e. CUDA_VISIBLE_DEVICES=“4,5,6,7”) to be used, in stead of starting with default cuda:0?
from accelerate import notebook_launcher
def training_function(model):
......
notebook_launcher(training_function, (model,), num_processes=4)
Otherwise I am getting this error if cuda:0 is occupied by another process.
RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 15.78 GiB total capacity; 3.94 GiB already
allocated; 55.19 MiB free; 3.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try
setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and
PYTORCH_CUDA_ALLOC_CONF