1

I am searching for methods to record the utilization at the GPU level. I have two definitions of utilization, optimistically I want to be able to compute both:

  1. The number of running/utilized cuda cores by the GPU at a time instance.
  2. Peak Efficiency Number of FLOPS per second.

I know there are some tools but all of them don't provide either information. For instance:

  • The utilization of Nvidia-smi shows the percent of the time a kernel of time regardless of how many cores and the speed of this execution, same for tools such as nvtop and gpustat.

  • Profilers such as Tensor-Flow Profiler and nvprof shows the efficiency in terms of FLOPs but on the kernel/program level and regardless of the effect of running multiple in parallel.

I am open to both tools and code-based solutions.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Walid Hanafy
  • 1,429
  • 2
  • 14
  • 26

1 Answers1

1

I may be wrong but I believe nvprof can show you these details. You will have to run it for the timeline and for the metrics

nvprof --export-profile timeline.prof <your_bin>
nvprof --metrics all --export-profile metrics.prof <your_bin> 

you can then import the files (in this example timeline.prof and metrics.prof) into the nvdia visual profiler which can be opened with nvvp.

Marwan N
  • 531
  • 6
  • 20
  • 1
    nvprof shows only the peak efficiency. but this is doesn't monitor the GPU state itself, it profiles the written code. I need to monitor such variables on the GPU level regardless of how many apps/threads are running. – Walid Hanafy Jul 20 '20 at 09:45