distributed tensorflow clarification

Asked Sep 14 '17 at 03:26

Active Sep 14 '17 at 03:26

Viewed 320 times

Is my understanding correct that model_deploy lets the user train a model using multiple devices on a single machine? The basic premise seems that the clone devices do variable sharing and variables get distributed to param servers in a round-robin fashion.

On the other hand distributed tensorflow framework enables the user to train a model through a cluster. A Cluster lets the user train a model using multiple devices across multiple servers.

I think the Slim documentation is very slim and the point has been raised couple of times already: Configuration/Flags for TF-Slim across multiple GPU/Machines

Thank you.

asked Sep 14 '17 at 03:26

deniz

You can guess by number of "TODO's" in that file that it's a work in progress. It builds on top of Distributed TF API. If you want to use something that's ready for prime-time, you are better off using the core distributed tensorflow framework – Yaroslav Bulatov Sep 14 '17 at 03:45
Yes, looks like it is a WIP. Thank you for your response. – deniz Sep 14 '17 at 13:32

distributed tensorflow clarification

0 Answers0