Stacked vs. eponymous torchrun cli options

Asked Mar 20 '23 at 16:42

Active Mar 20 '23 at 16:42

Viewed 23 times

Docs here: https://pytorch.org/docs/stable/elastic/run.html#single-node-multi-worker

In the Pytorch docs for torchrun, it lists two options for single-node multi-worker training: “Single-node multi-worker” and “Stacked single-node multi-worker”.

For me the “single-node multi-worker” did not work as intended but the “Stacked single-node multi-worker” training worked exactly as expected. “single-node multi-worker” training only processed on one GPU while the “stacked” version of the command engaged all available GPUs. What are the intended differences and use cases of these two options?

I expected single-node multi-worker to use multiple workers on one node, I got one one worker on one node. Using the "Stacked" single-node multi-worker resulted in the behavior I expected

asked Mar 20 '23 at 16:42

Rob

Stacked vs. eponymous torchrun cli options

0 Answers0