If I run a tensorflow model (e.g. cifar10) with one GPU on a multi-gpu platform, tensorflow creates and broadcasts (training/inference) data across all the GPUs available. Since I set num_gpus to 1, it's running on only one GPU. However, I can see the same processes on other gpus as well. Is it intended? Is there any rationale for this? I quickly checked with other DL frameworks like Caffe, but the design/operation is different. Of course, I can specify device
in the code level, but I'm curious. Also, this default design might be annoying for other users if the machine is shared.
tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name:
tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:1) -> (device: 1, name:
tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:2) -> (device: 2, name:
tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:3) -> (device: 3, name: ...
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 67056 C python 15623MiB |
| 1 67056 C python 15499MiB |
| 2 67056 C python 15499MiB |
| 3 67056 C python 15499MiB |
| 4 67056 C python 15499MiB |