Docs here: https://pytorch.org/docs/stable/elastic/run.html#single-node-multi-worker
In the Pytorch docs for torchrun, it lists two options for single-node multi-worker training: “Single-node multi-worker” and “Stacked single-node multi-worker”.
For me the “single-node multi-worker” did not work as intended but the “Stacked single-node multi-worker” training worked exactly as expected. “single-node multi-worker” training only processed on one GPU while the “stacked” version of the command engaged all available GPUs. What are the intended differences and use cases of these two options?
I expected single-node multi-worker to use multiple workers on one node, I got one one worker on one node. Using the "Stacked" single-node multi-worker resulted in the behavior I expected