0

ray provide an example for resnet distribution training. but the gradients syncronization is strange:

  1. sync up weigths
  2. train each worker a specific steps independently.
  3. go back to step 1.

Is there any justification for this workflow?

I think it's neither sync or async method.

scott huang
  • 2,478
  • 4
  • 21
  • 36

1 Answers1

0

It's a slight generalization of normal synchronous SGD. If you train each worker for exactly one step in step 2, then it is regular batch SGD.

Robert Nishihara
  • 3,276
  • 16
  • 17
  • but, the default step is 200. so this method is likely have more noise then normal async method. because nomrally we train NN with data shuffling. – scott huang Mar 06 '19 at 02:07