I am searching for methods to record the utilization at the GPU level. I have two definitions of utilization, optimistically I want to be able to compute both:
- The number of running/utilized cuda cores by the GPU at a time instance.
- Peak Efficiency Number of FLOPS per second.
I know there are some tools but all of them don't provide either information. For instance:
The utilization of
Nvidia-smi
shows the percent of the time a kernel of time regardless of how many cores and the speed of this execution, same for tools such as nvtop and gpustat.Profilers such as
Tensor-Flow Profiler
andnvprof
shows the efficiency in terms of FLOPs but on the kernel/program level and regardless of the effect of running multiple in parallel.
I am open to both tools and code-based solutions.