0

When using Accelerate's notebook_launcher to kickoff a training job spawning across multiple GPUs, is there a way to specify which GPUs (i.e. CUDA_VISIBLE_DEVICES=“4,5,6,7”) to be used, in stead of starting with default cuda:0?

from accelerate import notebook_launcher

def training_function(model):
   ......

notebook_launcher(training_function, (model,), num_processes=4)

Otherwise I am getting this error if cuda:0 is occupied by another process.

RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 15.78 GiB total capacity; 3.94 GiB already 
allocated; 55.19 MiB free; 3.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try 
setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and 
PYTORCH_CUDA_ALLOC_CONF

0 Answers0