0

I wrote some Java code that uses JCuda to execute some CUDA kernels. I would like to profile the application in order to understand how streams are overlapped and whatnot. I am able to use cuda event calls such as cudaEventElpasedTime to get the execution time of a kernel, but I do not know how to get the starting and ending timestamps for the same kernel.

I know nvprof can generate such results and display the timelines, but I do not find a way to run nvprof with a Java application.

Edit: Now I understand how to use nvprof to profile a Java application thanks to the answers. I still prefer getting the starting and ending times using cudaEvent calls so I would have more control. It seems nvprof can get that information but there is no APIs for an end user to do so?

Xiangyu
  • 824
  • 9
  • 34
  • @Shadow I would still prefer getting the starting and ending time using cudaEvent calls as it gives me more control over what to be profiled. – Xiangyu May 19 '17 at 00:38
  • You can also use the Visal Profiler. After it [did not work](https://forum.byte-welt.net/t/jcuda-and-nvvp-visual-profiler/3667) in [some other versions](https://devtalk.nvidia.com/default/topic/524531/profiler-error-message-when-profiling-jcuda-application/) , it finally seems to work again with CUDA 8.0. – Marco13 May 23 '17 at 21:12
  • @Marco13, does this only work under windows? I read that we need to make a .bat for it to work, I have not tried a .sh script under linux. – Xiangyu May 24 '17 at 16:04
  • I just tried it under Windows (8.1). I think it *should* also work under Linux with a `sh` file, but am not sure (I haven't used the visual profiler actively for a while, because it didn't work with JCuda, and never used it on Linux at all, but *conceptually*, I think that it *should* work...) – Marco13 May 24 '17 at 18:46

1 Answers1

2

There are two ways to do this:

  1. If you can run your JCuda application via the command-line, you can profile it using the command nvprof --profile-child-processes <command to run your JCuda application>

  2. If you cannot run your application via the command-line, open a terminal and run nvprof using the command nvprof --profile-all-processes. Nvprof will go into daemon mode and keep waiting for CUDA activity to happen. Now launch your application as usual from your IDE, and once CUDA activity happens and the application exits, nvprof will print results in its terminal session.

ApoorvaJ
  • 830
  • 1
  • 7
  • 24
  • This is a great answer. Since the JCuda application runs on multiple threads, I think --profile-child-processes might only return the execution time in one JVM, I will look into it and post my findings here later. – Xiangyu May 17 '17 at 15:35