I wrote some Java code that uses JCuda to execute some CUDA kernels. I would like to profile the application in order to understand how streams are overlapped and whatnot. I am able to use cuda event calls such as cudaEventElpasedTime to get the execution time of a kernel, but I do not know how to get the starting and ending timestamps for the same kernel.
I know nvprof can generate such results and display the timelines, but I do not find a way to run nvprof with a Java application.
Edit: Now I understand how to use nvprof to profile a Java application thanks to the answers. I still prefer getting the starting and ending times using cudaEvent calls so I would have more control. It seems nvprof can get that information but there is no APIs for an end user to do so?