ray provide an example for resnet distribution training. but the gradients syncronization is strange:
- sync up weigths
- train each worker a specific steps independently.
- go back to step 1.
Is there any justification for this workflow?
I think it's neither sync or async method.