I am new to run yolo Object Detection with GPU
the configration of server in lab is here:
ubantu18.04 tensorflow2.2.0 CUDA=10.1 and it has 4 tesla GPUs
when I run tf.test.is_gpu_avaliable() the answer is true and the result is here:
pciBusID: 0000:04:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.116943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:08:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.119336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties:
pciBusID: 0000:86:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.121766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 3 with properties:
pciBusID: 0000:8a:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-04-23 08:56:49.121840: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-23 08:56:49.121877: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-23 08:56:49.121910: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-23 08:56:49.121941: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-23 08:56:49.121971: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-23 08:56:49.122001: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-23 08:56:49.122032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-23 08:56:49.134068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2, 3
2020-04-23 08:56:49.134119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-23 08:56:49.140144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-23 08:56:49.140169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 1 2 3
2020-04-23 08:56:49.140180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N Y N N
2020-04-23 08:56:49.140187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1: Y N N N
2020-04-23 08:56:49.140194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 2: N N N Y
2020-04-23 08:56:49.140203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 3: N N Y N
2020-04-23 08:56:49.146042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 141 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0, compute capability: 6.0)
2020-04-23 08:56:49.148097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:1 with 14758 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0)
2020-04-23 08:56:49.150111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:2 with 14758 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0)
2020-04-23 08:56:49.152165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:3 with 14758 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:8a:00.0, compute capability: 6.0)
True
I thick the computer has recongnize 4 GPUs.howerey when I ran my programme I encountered a stange problem that only one GPU run properly others are not run. the result of nvidia-smi when the programme is running.
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:04:00.0 Off | 0 |
| N/A 36C P0 31W / 250W | 15323MiB / 16280MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:08:00.0 Off | 0 |
| N/A 41C P0 31W / 250W | 265MiB / 16280MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 40C P0 32W / 250W | 265MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 00000000:8A:00.0 Off | 0 |
| N/A 38C P0 30W / 250W | 265MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 10540 C python 15313MiB |
| 1 10540 C python 255MiB |
| 2 10540 C python 255MiB |
| 3 10540 C python 255MiB |
+-----------------------------------------------------------------------------+
someone in website says the version of tf is not match the version of CUDA,but I cannot change CUDA version because the server does not belong to me,I can only change tf version.So may somebody give me some suggitions to make it use 4GPUs,thanks