Inference time of tiny-yolo-v3 on GPU

Question

I am doing inference of tiny-yolo-v3 on google collab using GPU runtime. GPU used was Tesla P100-PCIE-16GB.

After running the darknet inference command , The predicted time shown was 0.91 seconds.

I could see from code that this time stamp is the processing time of the network on GPU which excludes pre and post processing of image. I have created cells which contains the same results.

Now, I am little confused regarding this . I know these GPUs are very costly and gives good performance. But 0.91 seconds inference time accounts to performance of 0.9 frames/second , which is not significant.

Can anyone tell me whether I am doing something wrong here? Or It is the actual performance of GPUs?

I know inference time depends on lot of parameters like network size etc, but how fast GPUs can process data in terms of Frames/second in networks like tiny-yolo-v3?

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 4007284112891679343, name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 16862634677760767602
 physical_device_desc: "device: XLA_CPU device", name: "/device:XLA_GPU:0"
 device_type: "XLA_GPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 10729193134179919719
 physical_device_desc: "device: XLA_GPU device", name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 15701463552
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 8937778522862983933
 physical_device_desc: "device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0"]

import tensorflow as tf
tf.test.gpu_device_name()

/device:GPU:0'

!./darknet detector test cfg/coco.data cfg/yolov3-tiny.cfg /yolov3-tiny.weights data/dog.jpg

layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   15 conv    255  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 255  0.044 BFLOPs
   16 yolo
   17 route  13
   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   20 route  19 8
   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   22 conv    255  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 255  0.088 BFLOPs
   23 yolo
Loading weights from /content/gdrive/My Drive/Darknet/yolov3-tiny.weights...Done!
data/dog.jpg: Predicted in 0.917487 seconds.
dog: 57%
car: 52%
truck: 56%
car: 62%
bicycle: 59%

Ata Jadid Ahari · Answer 1 · 2022-05-30T17:48:08.713

You have to make the darknet with GPU enabled, in order to be able to use GPU to perform inference, and the time you get for inference currently, is because the inference is being done by CPU, rather than GPU. I came across this problem, and on my own laptop, I got an inference time of 1.2 seconds. After I enabled CUDA and made the project with GPU enabled, I got an inference time of approximately 0.2 seconds on Nvidia Geforce GTX 960. In order to make darknet with GPU, open the Makefile, and change the line GPU=0 to GPU=1. Then make the project again. I assume that you would get an inference time of 0.05 seconds if you run the code on colab.

Inference time of tiny-yolo-v3 on GPU

1 Answers1