Variables and ops are defined in tf.estimator.Estimator
, which actually uses replica_device_setter
(defined here). As you can see, it assigns variables to ps
jobs and ops to worker
jobs, which is the common way to handle data parallelism.
replica_device_setter
returns a device function that assigns ops and variables to devices. Even if you're using data parallelism, you might have many parameter servers, and a device function will ensure each parameter server gets separate variables (determined by ps_strategy
of replica_device_setter
). e.g. /job:ps/tasks:0
could get W1
and b1
, and /job:ps/tasks:1
could get W2
and b2
. The device function has to be deterministic in assigning variables to parameter servers, since the function is called every time a worker replica is instantiated, and the workers need to agree on which ps
holds which variables.
tf.(contrib.)learn libraries use between-graph replication. This means that each worker replica will build a separate graph, with the non-Variable ops assigned to that worker: worker with task index 2 defines ops to /job:worker/task:2
, and variables to /job:ps
(which specific ps
is determined by ps_strategy
). This means that the worker replica will compute the ops (loss value & gradients) itself, and send the resulting variable updates (gradients) to the particular parameter servers that are responsile for holding the particular variables.
If you didn't have a mechanism to assign variables/ops to devices, it would not be clear which replica should hold which variables and ops. Assigning to specific devices might also be needed if you have several GPUs on a worker replica: even though your variables are stored on parameter servers, you would need to create the compute-intensive part of the graph once for each of your GPUs (with explicitly assigning the created ops to the relevant GPU).