Is GPU efficient on parameter server for data parallel training?

Question

On data parallel training, I guess the GPU instance is not necessarily efficient for parameter servers because parameter servers only keep the values and don't run any computation such as matrix multiplication.

Therefore, I think the example config for Cloud ML Engine (using CPU for parameter servers and GPU for others) below has good cost performance:

trainingInput:
  scaleTier: CUSTOM
  masterType: standard_gpu
  workerType: standard_gpu
  parameterServerType: standard_cpu
  workerCount: 3
  parameterServerCount: 4

Is that right?

I am not an expert on this but I think GPU has enough bandwidth. So passing parameters back and forth can be done efficiently with GPU, but that means wasting a GPU. However, I may be wrong. See [this](http://www.pdl.cmu.edu/PDL-FTP/CloudComputing/GeePS-cui-eurosys16.pdf) paper that uses GPU as a parameter server. — Autonomous, Apr 14 '17 at 07:48

score 0 · Accepted Answer · answered Apr 15 '17 at 01:16

Your assumption is a reasonable rule of thumb. That said, Parag points to a paper that describes a model that can leverage GPUs in the parameter server, so it's not always the case that parameter servers are not able to leverage GPUs.

In general, you may want to try both for a short time and see if throughput improves.

If you have any question as to what ops are actually being assigned to your parameter server, you can log the device placement. If it looks like ops are on the parameter server that can benefit from the GPU (and supposing they really should be there), then you can go ahead and try a GPU in the parameter server.

Is GPU efficient on parameter server for data parallel training?

1 Answers1