0

I am new to run yolo Object Detection with GPU

the configration of server in lab is here:

ubantu18.04 tensorflow2.2.0 CUDA=10.1 and it has 4 tesla GPUs

when I run tf.test.is_gpu_avaliable() the answer is true and the result is here:

pciBusID: 0000:04:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.116943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:08:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.119336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: 
pciBusID: 0000:86:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.121766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 3 with properties: 
pciBusID: 0000:8a:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.121840: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-23 08:56:49.121877: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-23 08:56:49.121910: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-23 08:56:49.121941: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-23 08:56:49.121971: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-23 08:56:49.122001: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-23 08:56:49.122032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-23 08:56:49.134068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2, 3
2020-04-23 08:56:49.134119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-23 08:56:49.140144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-23 08:56:49.140169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 1 2 3 
2020-04-23 08:56:49.140180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N Y N N 
2020-04-23 08:56:49.140187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1:   Y N N N 
2020-04-23 08:56:49.140194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 2:   N N N Y 
2020-04-23 08:56:49.140203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 3:   N N Y N 
2020-04-23 08:56:49.146042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 141 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2020-04-23 08:56:49.148097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:1 with 14758 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0)
2020-04-23 08:56:49.150111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:2 with 14758 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0)
2020-04-23 08:56:49.152165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:3 with 14758 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:8a:00.0, compute capability: 6.0)
True

I thick the computer has recongnize 4 GPUs.howerey when I ran my programme I encountered a stange problem that only one GPU run properly others are not run. the result of nvidia-smi when the programme is running.

| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:04:00.0 Off |                    0 |
| N/A   36C    P0    31W / 250W |  15323MiB / 16280MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:08:00.0 Off |                    0 |
| N/A   41C    P0    31W / 250W |    265MiB / 16280MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   40C    P0    32W / 250W |    265MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 00000000:8A:00.0 Off |                    0 |
| N/A   38C    P0    30W / 250W |    265MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     10540      C   python                                     15313MiB |
|    1     10540      C   python                                       255MiB |
|    2     10540      C   python                                       255MiB |
|    3     10540      C   python                                       255MiB |
+-----------------------------------------------------------------------------+

someone in website says the version of tf is not match the version of CUDA,but I cannot change CUDA version because the server does not belong to me,I can only change tf version.So may somebody give me some suggitions to make it use 4GPUs,thanks

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • 1
    Have you programmed it to use multiple GPUs? tensorflow or keras does not automatically make use of multiple GPUs unless you program it that way. – Bashir Kazimi Apr 23 '20 at 06:45
  • 1
    Please check the official guide on multi-GPU training: https://www.tensorflow.org/guide/distributed_training – Richard_wth Apr 23 '20 at 13:00

0 Answers0