Questions tagged [nvvp]

NVVP (NVIDIA Visual Profiler) is the name of NVIDIA's proprietary GUI-enabled GPU CUDA profiling tool.

The NVIDIA Visual Profiler is a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications. First introduced in 2008, Visual Profiler supports all 350 million+ CUDA capable NVIDIA GPUs shipped since 2006 on Linux, Mac OS X, and Windows. The NVIDIA Visual Profiler is available as part of the CUDA Toolkit. (source, official website)

The PGI Profiler (PGPROF) is strongly based on NVVP.

NVIDIA Visual Profiler offers both GUI and command line options (pgprof or nvprof), some basic informations can be found here: https://www.pgroup.com/resources/pgprof-quickstart.htm

More detailed information:
http://docs.nvidia.com/cuda/profiler-users-guide/index.html

44 questions
1
vote
1 answer

CUDA kernels are not overlapping

I have a simple vector multiplication kernel, which I am executing for 2 streams. But when I profile in NVVP, kernels do not seem to overlap. Is it because each kernel execution utilizes %100 of GPU, if not what can be the cause ? Source code…
uahakan
  • 576
  • 1
  • 6
  • 23
1
vote
1 answer

CUDA streams not running in parallel

Given this code: void foo(cv::gpu::GpuMat const &src, cv::gpu::GpuMat *dst[], cv::Size const dst_size[], size_t numImages) { cudaStream_t streams[numImages]; for (size_t image = 0; image < numImages; ++image) { …
Ken Y-N
  • 14,644
  • 21
  • 71
  • 114
1
vote
1 answer

How to associate events, metrics and source-level results for profiling a pyCUDA program using nvvp

When I try to profile my pyCUDA application using nvvp, it works for the most part. I can click on "Examine GPU Usage" and view a number of analysis results / suggestions for my code, such as "Low Compute / Memcpy Efficiency." However, everytime…
weemattisnot
  • 889
  • 5
  • 16
1
vote
1 answer

How to view CUDA library function calls in profiler?

I am using the cuFFT library. How do I modify my code to see the function calls from this library (or any other CUDA library) in the NVIDIA Visual Profiler NVVP? I am using Windows and Visual Studio 2013. Below is my code. I convert my image and…
user8919
  • 67
  • 2
  • 9
1
vote
1 answer

How can I obtain timing values from the output of nvprof or of NVidia Visual Profiler?

I'm using nvprof to profile something (which includes both CPU work and GPU work, i.e. I use nvprof markers etc.), and I get binary files which nvprof produces. I can import these into NVVP (NVidia Visual Profiler; Linux version), and with a little…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
1
vote
1 answer

CUDA profiling inside kernel

Is there any option to profile a CUDA kernel? Not as a whole, but rather part of it. I have some device functions invocation and I want to measure their times. Are there any flags/events/instructions that I can set and then it will be visible in…
1
vote
1 answer

Cuda zero-copy performance

Does anyone have experience with analyzing the performance of CUDA applications utilizing the zero-copy (reference here: Default Pinned Memory Vs Zero-Copy Memory) memory model? I have a kernel that uses the zero-copy feature and with NVVP I see the…
user926914
  • 193
  • 1
  • 15
0
votes
1 answer

Meaning of the "flop_count_sp" and "inst_fp_32" metric in CUDA Profiler

According to the profiler user guide: flop_count_sp: Number of single-precision floating-point operations executed by non-predicated threads (add, multiply and multiply-accumulate). Each multiply-accumulate operation contributes 2 to the count. The…
Booo
  • 493
  • 3
  • 13
0
votes
1 answer

NVIDIA Visual Profiler: Insufficient kernel bounds data

I am trying to get some insight of why my CUDA kernel has a relatively low performance and I am hoping to get some answers with the NVIDIA profiler. My CUDA program is a 'boiled down' version of a larger application, isolating and exercising the…
ritter
  • 7,447
  • 7
  • 51
  • 84
0
votes
1 answer

How to stop running TensorRT server without using ctrl-c (for profiling with nvprof)

I'm running nvprof to profile GPU usage of a TensorRT server-client model. Here's what I'm doing: Run nvprof on terminal 1 within a docker container with TensorRT enabled, nvprof --profile-all-processes -o results%p.nvvp Run TensorRT server on…
WannabeArchitect
  • 1,058
  • 2
  • 11
  • 22
0
votes
0 answers

How can a registers-only instruction stall due to "memory dependencies"?

I am profiling CUDA kernel using nvprof with PC sampling enabled, as to understand some latency problems I am having. The GPU I am using is the P100 (compute 6.0) PC sampling reports that a DFMA is stalling frequently due to memory dependencies. The…
Daniel
  • 639
  • 1
  • 4
  • 17
0
votes
1 answer

What does "Instruction Issued" mean in the report provided by CUDA nvvp?

I use Nvidia visual profiler (nvvp) to perform kernel profiling on cublas kernel. This link Latency Distribution is the latency distribution result. The document explains the "instruction issued" term in this way - "Instruction Issued - Warp was…
0
votes
1 answer

How to profile CUDA code on a headless node?

I'm working on a CUDA application I'd like to profile. Up to now all I've used is the command line profiler, nvprof, which just displayes the summarized statistics. I thought about using the GUI profiler, NVVP. The problem is that the remote Linux…
marmistrz
  • 5,974
  • 10
  • 42
  • 94
0
votes
1 answer

CUDA pointer arithmetic causes uncoalesced memory access?

I am working with a CUDA kernel that must operate on pointers-to-pointers. The kernel basically performs a large number of very small reductions, which are best done in serial since the reductions are of size Nptrs=3-4. Here are two implementations…
AGML
  • 890
  • 6
  • 18
0
votes
1 answer

nsight EE and nvvp both crash during startup on Ubuntu 16.10

Whenenevr I start both applications they crash after the splash-screen appears. A small dialog appears with the message an error has occurred. see the log file null (I don't know where to find said null file). nsight console error message Java…
Olumide
  • 5,397
  • 10
  • 55
  • 104