question of gradients syncronization of resnet example

Question

ray provide an example for resnet distribution training. but the gradients syncronization is strange:

Is there any justification for this workflow?

I think it's neither sync or async method.

score 0 · Answer 1 · answered Mar 05 '19 at 21:48

0

It's a slight generalization of normal synchronous SGD. If you train each worker for exactly one step in step 2, then it is regular batch SGD.

answered Mar 05 '19 at 21:48

Robert Nishihara

but, the default step is 200. so this method is likely have more noise then normal async method. because nomrally we train NN with data shuffling. – scott huang Mar 06 '19 at 02:07

1 Answers1