0

Is my understanding correct that model_deploy lets the user train a model using multiple devices on a single machine? The basic premise seems that the clone devices do variable sharing and variables get distributed to param servers in a round-robin fashion.

On the other hand distributed tensorflow framework enables the user to train a model through a cluster. A Cluster lets the user train a model using multiple devices across multiple servers.

I think the Slim documentation is very slim and the point has been raised couple of times already: Configuration/Flags for TF-Slim across multiple GPU/Machines

Thank you.

deniz
  • 23
  • 5
  • You can guess by number of "TODO's" in that file that it's a work in progress. It builds on top of Distributed TF API. If you want to use something that's ready for prime-time, you are better off using the core distributed tensorflow framework – Yaroslav Bulatov Sep 14 '17 at 03:45
  • Yes, looks like it is a WIP. Thank you for your response. – deniz Sep 14 '17 at 13:32

0 Answers0