In tensorFlow1.x distributed PS + Worker training, does worker halt each other when doing sess.run()?

Asked Sep 07 '22 at 13:26

Active Sep 07 '22 at 13:26

Viewed 21 times

This is a general question regarding PS + Workers training paradigm in TensorFlow. Suppose this scenario:

1 PS + 2 Workers are training asynchronizely(suppose they have different training speed) and suppose their graphs are all something like input -> linear_1 -> linear_2 -> loss -> udpate_weights_linear_1 -> update_weights_linear_2.

So is each worker's sess.run() atomic to the whole cluster? I mean if this situation can happen in sequence:

worker_0 finishes calculation of linear_1
then worker_1 executes update_weights_linear_2 on PS.
worker 0 pulls updated weights_linear_2 on PS and then calculates linear 2.

Thanks!

asked Sep 07 '22 at 13:26

Interfish

In tensorFlow1.x distributed PS + Worker training, does worker halt each other when doing sess.run()?

0 Answers0