1

Is there any way by which I can know the number of free/active SMs? Or atleast to read the voltage/power or temperature values of each SM by which I can know whether its working or not? (in real time while some job is getting executed on the gpu device).

%smid helped me in knowing the Id of each SM. Something similar would be helpful.

Thanks and Regards, Rakesh

Rakesh Kumar
  • 51
  • 1
  • 7

1 Answers1

2

The CUDA Profiling Tools Interface (CUPTI) contains an Events API that enables run time sampling of GPU PM counters. The CUPTI SDK ships as part of the CUDA Toolkit. Documentation on sampling can be found in the section CUPTI Events API \ Sampling Events.

One or more of the following counters will provide you a good idea of SM activity:

  • active_cycles: Number of cycles a multiprocessor has at least one active warp.
  • active_warps: Accumulated number of active warps per cycle. For every cycle it increments by the number of active warps in the cycle which can be in the range 0 to {48,64}.
Greg Smith
  • 11,007
  • 2
  • 36
  • 37
  • Thank you. That is very helful. But I need little more clarifications. Suppose a CUDA application is running and it occupies 2 SMs out of 14 for say, 50 seconds. (I can check this using nvprof. But nvprof gives the active_cycles or active_warps result at the end). By using the CUPTI APIs if I develop another profiling application, is it possible to run it concurrently with the CUDA application and know and log the number of SMs used by CUDA application at 5th second, 10th second etc.? – Rakesh Kumar Feb 15 '13 at 04:30
  • Yes. Please read the document and reivew the sample in {CUDA Toolkit}\extras\CUPTI\sample\event_sampling. The sample queries the counters on a background thread at 200 Hz. You should probably query at greater than 2^32 / gpu_core_clock_frequency / log2(max_event_increment) or you will get overflow on some counters. – Greg Smith Feb 15 '13 at 18:15
  • I went through event_sampling code. They have used two threads in the same program, one of them does the job of sampling. My situation is different. I want my profiling application with CUPTI APIs to just query and display the results (SM status, active_cycles or active_warps would be fine) to me at the instant I ask for; just like how nvidia-smi does. This should be independent of the background process(s) being run as general application(s) on the GPU already. – Rakesh Kumar Feb 16 '13 at 03:11
  • [continuing..] This means, the application already being run on the GPU and my profiling application are two different process and thus fall under different contexts. But at the instant I query, my profiling application should run concurrently with the background user application for certain milliseconds and display me the result right then. The user application should not halt when my profiling application runs either. – Rakesh Kumar Feb 16 '13 at 03:17