Questions tagged [nvvp]

NVVP (NVIDIA Visual Profiler) is the name of NVIDIA's proprietary GUI-enabled GPU CUDA profiling tool.

The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows. The NVIDIA Visual Profiler is available as part of the CUDA Toolkit. (source, official website)

The PGI Profiler (PGPROF) is strongly based on NVVP.

NVIDIA Visual Profiler offers both GUI and command line options (pgprof or nvprof), some basic informations can be found here: https://www.pgroup.com/resources/pgprof-quickstart.htm

More detailed information:
http://docs.nvidia.com/cuda/profiler-users-guide/index.html

44 questions

votes

1 answer

How to interpret NVIDIA Visual Profiler analysis/recommendations?

I'm relatively new to CUDA and am currently under a project to accelerate computer vision applications in embedded systems with gpu's attached(NVIDIA TX1). What I'm trying to do is select between two libraries: OpenCV and VisionWorks(includes…

asked May 02 '17 at 13:48

user7952275

votes

0 answers

"The kernel was blocked for an uncommon reason" - what are these reasons?

The nvvp CUDA profiler frontend offers an analysis breaking down the causes for warps waiting for execution of their next instruction. We have categories such as "Execution latency", "Memory dependency", "Texture dependency", etc. - and one category…

cuda profiling gpgpu nvvp

asked Apr 13 '17 at 09:53

einpoklum

118,144
57
340
684

votes

1 answer

Profiling OpenCL application on Windows with NVIDIA GPU

can you help me? I'm developing OpenCL application on windows 7 x64. Hardware is Intel Core i5, NVIDIA GTX 770. OpenCL uses NVIDIA for acceleration. If I'm trying to use Intel VTune Amplifier XE 2015 my application hangs on the end of profiling and…

profiling opencl nvidia intel-vtune nvvp

asked Aug 10 '16 at 11:40

Mike

votes

1 answer

Is there any difference in the output of nvvp (visual) and nvprof (command line)?

To measure metrics/events for CUDA programs, I have tried using the command line like: nvprof --metrics <> I also measured the same metrics on the Visual profiler nvvp. I noticed no difference in the values I get. I noticed a…

cuda gpu nvidia nvvp nvprof

asked Jun 04 '16 at 07:09

Kajal

votes

2 answers

Can I profile OpenACC kernel in C source code level?

I'm trying to speed-up my code with openacc with PGI 15.7 compiler. I want to profile my code in C source level. I'm using 'nvvp' profiler from CUDA 7.0 When I run nvvp, I can use 'analysis tap' and can get which latency is the reason my code…

cuda gpu nvidia openacc nvvp

asked Sep 08 '15 at 09:31

soongk

votes

1 answer

Is it possible to automatically repeat several executions on NVVP?

I'm trying to extract some metrics from my application and need to execute it a lot of times and take the mean of the metrics. I was googling for it but didn't find anything, and nothing here on stackoverflow too. Thanks.

cuda nvvp

asked Aug 05 '14 at 19:46

Blufter

votes

1 answer

nvprof to open trace format or slog2

I want to generate trace of my cuda program and view it. so I run it using following command nvprof --print-gpu-trace ./my_exec Which prints the trace in text format which has its own limitations to understand. It has been mentioned that I can save…

cuda profiling trace nvvp

asked Jan 01 '14 at 20:08

arbitUser1401

votes

2 answers

CUDA Visual profiler over a remote X session

I am running an Ubuntu 11.10 server, CUDA-5.0 with a GTX480 on it. I am trying to run the visual profiler remotely by using Xming and Cygwin/X on Windows 8. I can successfully run xclocks, but when I try to launch /usr/local/cuda-5.0/bin/nvvp from…

cuda nvvp

asked May 22 '13 at 18:29

fall3nm0nk

votes

1 answer

is there anyway to avoid this serialization behavior in cuda profiling?

According to CUDA streams not overlapping , "the profiler will serialize streaming to get accurate timing data". Now the question is, is there anyway to avoid this serialization behavior in cuda profiling (say nvvp)? I am using Fermin M2090 and…

cuda nvvp

asked Jan 23 '13 at 00:39

Hailiang Zhang

17,604
23
71
117

votes

1 answer

nvvp and nsight's profiler give a different result?

I want to try gst_inst_128bit instruction. In the same program, nvvp give a lot of gst_inst_128bit command executed. While in nsight's profiler, 4 times gst_inst_32bit instructions is obtained. They should be a same program. How could this situation…

cuda nsight nvvp

asked Jan 10 '13 at 09:27

worldterminator

2,968
6
33
52

votes

0 answers

How to profile (visually) a CUDA code which is implemented in a python package through C extension?

Possible Duplicate: How to profile PyCuda code with the Visual Profiler? The CUDA visual profiler (nvvp) requires an executable entry for profiling, but my CUDA code is implemented in a python package by through C extension. Is there anyway to do…

python cuda profiling cprofile nvvp

asked Jan 07 '13 at 23:09

Hailiang Zhang

17,604
23
71
117

-1

votes

1 answer

Why operations in two CUDA Streams are not overlapping?

My program is a pipeline, which contains multiple kernels and memcpys. Each task will go through the same pipeline with different input data. The host code will first chooses a Channel, an encapsulation of scratchpad memory and CUDA objects, when it…

cuda nvprof cuda-streams nvvp

asked Jan 15 '19 at 14:47

StrikeW

-1

votes

1 answer

CUDA's nvvp reports non-ideal memory access pattern, but bandwidth is almost peaking

EDIT: new minimal working example to illustrate the question and better explanation of nvvp's outcome (following suggestions given in the comments). So, I have crafted a "minimal" working example, which follows: #include #include…

cuda nvvp

asked Nov 08 '18 at 18:32

Elias

-1

votes

1 answer

How to print api calls per thread with nvprof

I am profiling a CUDA application and dumping the logs to a file say target.prof My application uses multiple threads to dispatch kernels and I want to observe the api calls from just one of those threads. I tried using nvprof -i target.prof…

cuda gpu nvidia nvprof nvvp

asked Sep 12 '18 at 05:28

Tapan Chugh

Prev 1 2