1

I've got an nVidia Pascal GPU running Caffe on Windows 10. When I run NSight on Visual Studio, forward propagation in testing mode shows only 4.3% utilization of the GPU with less than 1% use of the 16 kernel calls.

I'm working on a real time system so I'm trying to get forward propagation to work as quickly as possible.

If I increase the kernel size, I'm going to have to rerun my training which is a very expensive process, time-wise.

What other tweaks can I make to Caffe or CUDA to increase the speed of the test?

talonmies
  • 70,661
  • 34
  • 192
  • 269
empty
  • 5,194
  • 3
  • 32
  • 58

0 Answers0