When using SageMaker Data Parallelism (SMDP), my team sees a higher utilization on GPU 0 compared to other GPUs. What can be the likely cause here? Does it have anything to do with the data loader workers that run on CPU? I would expect SMDP to shard the datasets equally.
Asked
Active
Viewed 36 times
0

Philipp Schmid
- 126
- 7
-
Is this behaviour noted throughout the training or only at the start. How much is gpu 0 utilization higher than other gpus? Does the throughput scale well as you increase the cluster size? Also please make sure to use the distributed DataLoader. – Arun Lokanatha Sep 15 '22 at 00:23