In distributed tensorflow, I used SyncReplicasOptimizerV2 to aggregate and update gradients, But when one of the regular workers(chief worker most of times) training finished, the other regular worker will hanged. How can I solve this problem.
OS: Ubuntu 14.04
tensorflow version: 0.12.0-rc1
my code is here: https://github.com/xiaop1987/tf_distribute_lr
-----------------------------Update 1------------2016-12-20-------------
I apply sync queue as Yaroslav Bulatov suggested, Now I can stop the ParameterServer successfully, but the other worker still hanged there, and the call stack as follows: