0

While training a DNN in a distributed manner, I would like to use Local SGD (also known as K-AVG SGD or Parallel SGD) for reducing the communication overhead by diminishing the number of synchronization points.

However, I am unable to find an implementation of Local SGD in TensorFlow. Do you have any experience with using this communication model?

Observation: This paper explains the benefits of using Local SGD.

0 Answers0