Is my understanding correct that model_deploy
lets the user train a model using multiple devices on a single machine? The basic premise seems that the clone devices do variable sharing and variables get distributed to param servers in a round-robin fashion.
On the other hand distributed tensorflow framework enables the user to train a model through a cluster. A Cluster lets the user train a model using multiple devices across multiple servers.
I think the Slim documentation is very slim and the point has been raised couple of times already: Configuration/Flags for TF-Slim across multiple GPU/Machines
Thank you.