1

I am using the cuFFT library. How do I modify my code to see the function calls from this library (or any other CUDA library) in the NVIDIA Visual Profiler NVVP? I am using Windows and Visual Studio 2013.

Below is my code. I convert my image and filter to the Fourier domain, then perform point-wise complex matrix multiplication in a custom CUDA kernel I wrote, and then simply perform the inverse DFT on the filtered images spectrum. The results are accurate, but I am not able to figure out how to view the cuFFT functions in the profiler.

// Execute FFT Plans
cufftExecR2C(fftPlanFwd, (cufftReal *)d_in, (cufftComplex *)d_img_Spectrum);
cufftExecR2C(fftPlanFwd, (cufftReal *)d_filter, (cufftComplex *)d_filter_Spectrum);

// Perform complex pointwise muliplication on filter spectrum and image spectrum
pointWise_complex_matrix_mult_kernel << <grid, block >> >(d_img_Spectrum, d_filter_Spectrum, d_filtered_Spectrum, ROWS, COLS);

// Execute FFT^-1 Plan                  
cufftExecC2R(fftPlanInv, (cufftComplex *)d_filtered_Spectrum, (cufftReal *)d_out);

enter image description here

user8919
  • 67
  • 2
  • 9

1 Answers1

3

At the entry point to the library, the library call is like any other call into a C or C++ library: it is executing on the host. Within that library call, there may be calls to CUDA kernels or other CUDA API functions, for a CUDA GPU-enabled library such as CUFFT.

The profilers (at least up through CUDA 7.0 - see note about CUDA 7.5 nvprof below) don't natively support the profiling of host code. They are primarily focused on kernel calls and CUDA API calls. A call into a library like CUFFT by itself is not considered a CUDA API call.

You haven't shown a complete profiler output, but you should see the CUFFT library make CUDA kernel calls; these will show up in the profiler output. The first two CUFFT calls prior to your pointWise_complex_matrix_mult_kernel should have one or more kernel calls each that show up to the left of that kernel, and the last CUFFT call should have one or more kernel calls that show up to the right of that kernel.

One possible way to get specific sections of host code to show up in the profiler is to use the NVTX (NVIDIA Tools Extension) library to annotate your source code, which will cause those annotations to show up in the profiler output. You might want to put an NVTX range event around the library call you wish to see identified in the profiler output.

Another approach would be to try out the new CPU profiling features in nvprof in CUDA 7.5. You can refer to section 3.4 of the Profiler guide that ships with CUDA 7.5RC.

Finally, ordinary host profilers should be able to profile your CUDA application, including CUFFT library calls, but they won't have any visibility into what is happening on the GPU.

EDIT: Based on discussion in the comments below, your code appears to be similar to the simpleCUFFT sample code. When I compile and profile that code on Win7 x64, VS 2013 Community, and CUDA 7, I get the following output (zoomed in to depict the interesting part of the timeline):

nvvp profiler timeline for simpleCUFFT sample code

You can see that there are CUFFT kernels being called both before and after the complex pointwise multiply and scale kernel that appears in that code. My suggestion would be to start by doing something similar with the simpleCUFFT sample code rather than your own code, and see if you can duplicate the output above. If so, the problem lies in your code (perhaps your CUFFT calls are failing, perhaps you need to add proper error checking, etc.)

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • 1
    For nvToolsExt you would want to use nvtxRangePush and nvtxRangePop for thread level ranges and nvtxRangeStart and nvtxRangeEnd for process level ranges. – Greg Smith Jul 14 '15 at 00:17
  • You say I haven't shown the complete profiler output since it doesn't show the kernel calls made by the cuFFT function. How do I show the complete profiler output to display those kernel calls? – user8919 Jul 16 '15 at 22:17
  • 1
    Zoom out on the timeline. – Robert Crovella Jul 17 '15 at 16:24
  • Here is the zoomed out timeline. I do not see the kernel calls for the cuFFT function. Am I doing something wrong? – user8919 Jul 17 '15 at 21:58
  • The image in the link is the profiler fully zoomed out. http://postimg.org/image/6bz1sc2nn/ I don't see the kernel launches associated with the cuFFt function call. Am I doing something wrong? – user8919 Jul 17 '15 at 22:03
  • I'm not really sure what you are doing wrong. You haven't provided a complete code so I suppose it's possible you have some sort of error, you're not doing any error checking, and your cufft calls aren't running at all. That's just a guess. I really don't know what is wrong. The cuda simpleCUFFT sample code appears to be similar to what you describe, so I profiled that code and added the profiler output to my answer above. You might want to see if you can duplicate that. – Robert Crovella Jul 18 '15 at 02:07
  • Where is the .exe file generated for the sample code? It is not generated in the "Output Directory": ../../bin/win64/$(Configuration)/ as listed in the properties. (?) I need that to profile the example. – user8919 Jul 18 '15 at 11:43
  • Take a look at the output window when VS is compiling that sample. It will display the name and path of the exe that it is generating. Or else use the windows file search function to list all of the simpleCUFFT.exe files on your machine. – Robert Crovella Jul 19 '15 at 13:05