0

I'm trying to successfully run code in PyTorch that uses DataLoader. It is possible to configure the DataLoader to load data using several processes (which speeds up data loading a lot), via the use of the num_workers argument, configuring it with a positive number (https://pytorch.org/docs/stable/data.html#multi-process-data-loading).

I can get the desired result by setting num_workers to a number well above 0, for example to use multiple load processes if I adapt my code to not use GPU, i.e. several CPU cores are used normally, each one working in a process, but if I adapt my code to use GPU, if I set num_workers to a value greater than 0, only 1 CPU core is used, which is linked to a single process (main) which is left with 100% kernel utilization and the program does not progress in its execution. Regarding the Slurm script, I configure it to use 1 node, 1 task, and x cpus per task, x being the value I configure for num_workers (as per instructions on this page - https://researchcomputing.princeton.edu/support/knowledge-base/pytorch - section "Data Loading using Multiple CPU-cores").

I've done many tests, but I can't solve this. The section "Single-process data loading (default)" - https://pytorch.org/docs/stable/data.html#single-process-data-loading-default, immediately preceding the section of the page that appears in the link that I passed above from the PyTorch website mentions that resource(s) used to share data between processes (e.g. shared memory, file descriptors) may be limited which would preclude using num_workers > 0 properly, so would that be the case? If so, would there be any configuration I should do in the Slurm script or in my code? Although I think it wouldn't make sense for a situation like this to occur, since by adapting my code to not use the GPU I don't have any problems using num_workers with a value > 0. Thanks in advance to your attention!

Marco
  • 1
  • Did you figure this out? I'm using a GPU and have `num_workers=7`, but only 2 CPU cores are being utilized (out of 8 cores). – Richard May 08 '23 at 19:38

0 Answers0