Questions tagged [nvvp]

NVVP (NVIDIA Visual Profiler) is the name of NVIDIA's proprietary GUI-enabled GPU CUDA profiling tool.

The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows. The NVIDIA Visual Profiler is available as part of the CUDA Toolkit. (source, official website)

The PGI Profiler (PGPROF) is strongly based on NVVP.

NVIDIA Visual Profiler offers both GUI and command line options (pgprof or nvprof), some basic informations can be found here: https://www.pgroup.com/resources/pgprof-quickstart.htm

More detailed information:
http://docs.nvidia.com/cuda/profiler-users-guide/index.html

44 questions
0
votes
1 answer

How to interpret NVIDIA Visual Profiler analysis/recommendations?

I'm relatively new to CUDA and am currently under a project to accelerate computer vision applications in embedded systems with gpu's attached(NVIDIA TX1). What I'm trying to do is select between two libraries: OpenCV and VisionWorks(includes…
user7952275
0
votes
0 answers

"The kernel was blocked for an uncommon reason" - what are these reasons?

The nvvp CUDA profiler frontend offers an analysis breaking down the causes for warps waiting for execution of their next instruction. We have categories such as "Execution latency", "Memory dependency", "Texture dependency", etc. - and one category…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
0
votes
1 answer

Profiling OpenCL application on Windows with NVIDIA GPU

can you help me? I'm developing OpenCL application on windows 7 x64. Hardware is Intel Core i5, NVIDIA GTX 770. OpenCL uses NVIDIA for acceleration. If I'm trying to use Intel VTune Amplifier XE 2015 my application hangs on the end of profiling and…
Mike
  • 43
  • 1
  • 5
0
votes
1 answer

Is there any difference in the output of nvvp (visual) and nvprof (command line)?

To measure metrics/events for CUDA programs, I have tried using the command line like: nvprof --metrics <> I also measured the same metrics on the Visual profiler nvvp. I noticed no difference in the values I get. I noticed a…
Kajal
  • 581
  • 11
  • 24
0
votes
2 answers

Can I profile OpenACC kernel in C source code level?

I'm trying to speed-up my code with openacc with PGI 15.7 compiler. I want to profile my code in C source level. I'm using 'nvvp' profiler from CUDA 7.0 When I run nvvp, I can use 'analysis tap' and can get which latency is the reason my code…
soongk
  • 259
  • 3
  • 17
0
votes
1 answer

Is it possible to automatically repeat several executions on NVVP?

I'm trying to extract some metrics from my application and need to execute it a lot of times and take the mean of the metrics. I was googling for it but didn't find anything, and nothing here on stackoverflow too. Thanks.
Blufter
  • 97
  • 1
  • 12
0
votes
1 answer

nvprof to open trace format or slog2

I want to generate trace of my cuda program and view it. so I run it using following command nvprof --print-gpu-trace ./my_exec Which prints the trace in text format which has its own limitations to understand. It has been mentioned that I can save…
arbitUser1401
  • 575
  • 2
  • 8
  • 25
0
votes
2 answers

CUDA Visual profiler over a remote X session

I am running an Ubuntu 11.10 server, CUDA-5.0 with a GTX480 on it. I am trying to run the visual profiler remotely by using Xming and Cygwin/X on Windows 8. I can successfully run xclocks, but when I try to launch /usr/local/cuda-5.0/bin/nvvp from…
fall3nm0nk
  • 45
  • 1
  • 5
0
votes
1 answer

is there anyway to avoid this serialization behavior in cuda profiling?

According to CUDA streams not overlapping , "the profiler will serialize streaming to get accurate timing data". Now the question is, is there anyway to avoid this serialization behavior in cuda profiling (say nvvp)? I am using Fermin M2090 and…
Hailiang Zhang
  • 17,604
  • 23
  • 71
  • 117
0
votes
1 answer

nvvp and nsight's profiler give a different result?

I want to try gst_inst_128bit instruction. In the same program, nvvp give a lot of gst_inst_128bit command executed. While in nsight's profiler, 4 times gst_inst_32bit instructions is obtained. They should be a same program. How could this situation…
worldterminator
  • 2,968
  • 6
  • 33
  • 52
0
votes
0 answers

How to profile (visually) a CUDA code which is implemented in a python package through C extension?

Possible Duplicate: How to profile PyCuda code with the Visual Profiler? The CUDA visual profiler (nvvp) requires an executable entry for profiling, but my CUDA code is implemented in a python package by through C extension. Is there anyway to do…
Hailiang Zhang
  • 17,604
  • 23
  • 71
  • 117
-1
votes
1 answer

Why operations in two CUDA Streams are not overlapping?

My program is a pipeline, which contains multiple kernels and memcpys. Each task will go through the same pipeline with different input data. The host code will first chooses a Channel, an encapsulation of scratchpad memory and CUDA objects, when it…
StrikeW
  • 501
  • 1
  • 4
  • 11
-1
votes
1 answer

CUDA's nvvp reports non-ideal memory access pattern, but bandwidth is almost peaking

EDIT: new minimal working example to illustrate the question and better explanation of nvvp's outcome (following suggestions given in the comments). So, I have crafted a "minimal" working example, which follows: #include #include…
Elias
  • 234
  • 2
  • 9
-1
votes
1 answer

How to print api calls per thread with nvprof

I am profiling a CUDA application and dumping the logs to a file say target.prof My application uses multiple threads to dispatch kernels and I want to observe the api calls from just one of those threads. I tried using nvprof -i target.prof…
Tapan Chugh
  • 354
  • 2
  • 4
1 2
3