0

I just build a system with two GTX 680 GPUs. To test my system I'm running cifar10_multi_gpu_train.py, training CIFAR10 using Tensorflow.

Tensorflow creates two Tensorflow devices based on the GPUs (last two lines):

$ python tutorials/image/cifar10/cifar10_multi_gpu_train.py 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 680
major: 3 minor: 0 memoryClockRate (GHz) 1.15
pciBusID 0000:01:00.0
Total memory: 3.94GiB
Free memory: 3.15GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x28eb270
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties: 
name: GeForce GTX 680
major: 3 minor: 0 memoryClockRate (GHz) 1.15
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.90GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 680, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 680, pci bus id: 0000:03:00.0)

However, when monitoring the GPUs during training (using watch -n 1 nvidia-smi), I noticed that the second GPU isn't getting hot at all (71 degrees for GPU0 vs 30 degrees for GPU1):

Every 1,0s: nvidia-smi                                  Mon Apr 24 01:30:40 2017

Mon Apr 24 01:30:40 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51                 Driver Version: 375.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 680     Off  | 0000:01:00.0     N/A |                  N/A |
| 43%   71C    P0    N/A /  N/A |   3947MiB /  4036MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 680     Off  | 0000:03:00.0     N/A |                  N/A |
| 30%   30C    P8    N/A /  N/A |   3737MiB /  4036MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
|    1                  Not Supported                                         |
+-----------------------------------------------------------------------------+

Also note here that the memory of both GPUs are completely allocated.

Why is my second GPU not used?

Visionscaper
  • 3,979
  • 1
  • 23
  • 26
  • I don't think the GPUs are being used at all, the memory might be too small to allocate anything on it. Add verbose logging to see if anything gets assigned to them. – fabrizioM Apr 23 '17 at 23:43
  • @fabrizioM I'm almost sure the first GPU is used, the number of examples per second is not bad (~1250/sec), further the temperature of the first GPU goes down when I stop the Python script. Nothing happens with the second GPU. Will try adjusting the verbosity level. – Visionscaper Apr 23 '17 at 23:53
  • @fabrizioM Setting tf.logging.set_verbosity(DEBUG) did not give me more output. Does this tell anything? 'creating context when one is currently active; existing: 0x28eb270' – Visionscaper Apr 24 '17 at 00:19
  • 1
    @fabrizioM I just figured out my issue, thanks for responding though! – Visionscaper Apr 24 '17 at 00:47

1 Answers1

1

Ok, I should have taken more time in reading the script:

tf.app.flags.DEFINE_integer('num_gpus', 1,
                            """How many GPUs to use.""")

I just set this to two, and everything works just fine:

Every 1,0s: nvidia-smi                                  Mon Apr 24 02:44:30 2017

Mon Apr 24 02:44:30 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51                 Driver Version: 375.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 680     Off  | 0000:01:00.0     N/A |                  N/A |
| 37%   63C    P0    N/A /  N/A |   3807MiB /  4036MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 680     Off  | 0000:03:00.0     N/A |                  N/A |
| 36%   61C    P0    N/A /  N/A |   3807MiB /  4036MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0                  Not Supported                                         |
|    1                  Not Supported                                         |
+-----------------------------------------------------------------------------+

I would have expected that the script would automatically use all the GPUs available.

Getting around 2450 examples/sec, 0.051 sec/batch with cifar10_multi_gpu_train.py.

Visionscaper
  • 3,979
  • 1
  • 23
  • 26