Questions tagged [nvvp]

NVVP (NVIDIA Visual Profiler) is the name of NVIDIA's proprietary GUI-enabled GPU CUDA profiling tool.

The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows. The NVIDIA Visual Profiler is available as part of the CUDA Toolkit. (source, official website)

The PGI Profiler (PGPROF) is strongly based on NVVP.

NVIDIA Visual Profiler offers both GUI and command line options (pgprof or nvprof), some basic informations can be found here: https://www.pgroup.com/resources/pgprof-quickstart.htm

More detailed information:
http://docs.nvidia.com/cuda/profiler-users-guide/index.html

44 questions

vote

1 answer

CUDA kernels are not overlapping

I have a simple vector multiplication kernel, which I am executing for 2 streams. But when I profile in NVVP, kernels do not seem to overlap. Is it because each kernel execution utilizes %100 of GPU, if not what can be the cause ? Source code…

asked Feb 04 '16 at 20:02

uahakan

vote

1 answer

CUDA streams not running in parallel

Given this code: void foo(cv::gpu::GpuMat const &src, cv::gpu::GpuMat *dst[], cv::Size const dst_size[], size_t numImages) { cudaStream_t streams[numImages]; for (size_t image = 0; image < numImages; ++image) { …

c++ cuda nvvp

asked Jan 18 '16 at 05:25

Ken Y-N

14,644
21
71
114

vote

1 answer

How to associate events, metrics and source-level results for profiling a pyCUDA program using nvvp

When I try to profile my pyCUDA application using nvvp, it works for the most part. I can click on "Examine GPU Usage" and view a number of analysis results / suggestions for my code, such as "Low Compute / Memcpy Efficiency." However, everytime…

profiling pycuda nvvp

asked Dec 07 '15 at 15:13

weemattisnot

vote

1 answer

How to view CUDA library function calls in profiler?

I am using the cuFFT library. How do I modify my code to see the function calls from this library (or any other CUDA library) in the NVIDIA Visual Profiler NVVP? I am using Windows and Visual Studio 2013. Below is my code. I convert my image and…

cuda cufft nvvp

asked Jul 13 '15 at 15:48

user8919

vote

1 answer

How can I obtain timing values from the output of nvprof or of NVidia Visual Profiler?

I'm using nvprof to profile something (which includes both CPU work and GPU work, i.e. I use nvprof markers etc.), and I get binary files which nvprof produces. I can import these into NVVP (NVidia Visual Profiler; Linux version), and with a little…

xml cuda profiling nvvp text-decoding

asked Oct 01 '14 at 15:40

einpoklum

118,144
57
340
684

vote

1 answer

CUDA profiling inside kernel

Is there any option to profile a CUDA kernel? Not as a whole, but rather part of it. I have some device functions invocation and I want to measure their times. Are there any flags/events/instructions that I can set and then it will be visible in…

cuda nvvp

asked May 30 '13 at 11:17

user2390724

vote

1 answer

Cuda zero-copy performance

Does anyone have experience with analyzing the performance of CUDA applications utilizing the zero-copy (reference here: Default Pinned Memory Vs Zero-Copy Memory) memory model? I have a kernel that uses the zero-copy feature and with NVVP I see the…

c++ cuda zero-copy nvvp

asked Dec 14 '12 at 01:38

user926914

votes

1 answer

Meaning of the "flop_count_sp" and "inst_fp_32" metric in CUDA Profiler

According to the profiler user guide: flop_count_sp: Number of single-precision floating-point operations executed by non-predicated threads (add, multiply and multiply-accumulate). Each multiply-accumulate operation contributes 2 to the count. The…

cuda gpu profiler nvprof nvvp

asked Sep 09 '20 at 17:06

Booo

votes

1 answer

NVIDIA Visual Profiler: Insufficient kernel bounds data

I am trying to get some insight of why my CUDA kernel has a relatively low performance and I am hoping to get some answers with the NVIDIA profiler. My CUDA program is a 'boiled down' version of a larger application, isolating and exercising the…

cuda nvprof nvvp

asked Aug 18 '20 at 22:08

ritter

7,447
7
51
84

votes

1 answer

How to stop running TensorRT server without using ctrl-c (for profiling with nvprof)

I'm running nvprof to profile GPU usage of a TensorRT server-client model. Here's what I'm doing: Run nvprof on terminal 1 within a docker container with TensorRT enabled, nvprof --profile-all-processes -o results%p.nvvp Run TensorRT server on…

docker tensorrt nvidia-docker nvprof nvvp

asked Mar 16 '20 at 07:30

WannabeArchitect

1,058
2
11
22

votes

0 answers

How can a registers-only instruction stall due to "memory dependencies"?

I am profiling CUDA kernel using nvprof with PC sampling enabled, as to understand some latency problems I am having. The GPU I am using is the P100 (compute 6.0) PC sampling reports that a DFMA is stalling frequently due to memory dependencies. The…

cuda nvprof nvvp

asked Dec 23 '18 at 20:24

Daniel

votes

1 answer

What does "Instruction Issued" mean in the report provided by CUDA nvvp?

I use Nvidia visual profiler (nvvp) to perform kernel profiling on cublas kernel. This link Latency Distribution is the latency distribution result. The document explains the "instruction issued" term in this way - "Instruction Issued - Warp was…

cuda profiling nvvp

asked Apr 19 '18 at 13:17

Xiuhong Li

votes

1 answer

How to profile CUDA code on a headless node?

I'm working on a CUDA application I'd like to profile. Up to now all I've used is the command line profiler, nvprof, which just displayes the summarized statistics. I thought about using the GUI profiler, NVVP. The problem is that the remote Linux…

cuda profiling nvprof nvvp

asked Nov 07 '17 at 21:34

marmistrz

5,974
10
42
94

votes

1 answer

CUDA pointer arithmetic causes uncoalesced memory access?

I am working with a CUDA kernel that must operate on pointers-to-pointers. The kernel basically performs a large number of very small reductions, which are best done in serial since the reductions are of size Nptrs=3-4. Here are two implementations…

cuda nvvp

asked Jun 11 '17 at 01:23

AGML

votes

1 answer

nsight EE and nvvp both crash during startup on Ubuntu 16.10

Whenenevr I start both applications they crash after the splash-screen appears. A small dialog appears with the message an error has occurred. see the log file null (I don't know where to find said null file). nsight console error message Java…

crash nvidia nsight nvvp

asked Jun 01 '17 at 21:39

Olumide

5,397
10
55
104

Prev 1

3 Next