Questions tagged [nvprof]

nvprof is a command-line profiler that enables you to collect and view CPU and GPU timers and events in CUDA programs.

89 questions
2
votes
0 answers

How to interpret nvprof results of cuda bandwidth-limited kernel?

I am running some GPU benchmarks to understand how to maximize the memory bandwidth from/to the global memory. I have an array of 128 MB (32*1024*1024 single-precision floating point numbers) aligned to a 128 bytes margin with three halo values…
Spiros
  • 2,156
  • 2
  • 23
  • 42
1
vote
1 answer

Roofline Model with CUDA Manual vs. Nsight Compute

I have a very simple vector addition kernel written for CUDA. I want to calculate the arithmetic intensity as well as GFLOP/s for this Kernel. The values I calculate differ visibly from the values obtained by Nsight Compute's Roofline Analysis…
Cherry Toska
  • 131
  • 8
1
vote
1 answer

Profilers (nvvp and nvprof) not showing "Page Fault" information

I am profiling a test code presented in the Unified Memory for CUDA Beginners on NVIDIA's developer forum. Code: #include #include // CUDA kernel to add elements of two arrays __global__ void add(int n, float* x, float* y) { …
skm
  • 5,015
  • 8
  • 43
  • 104
1
vote
1 answer

How to capture GPU data when profiling Tensorflow code with nvprof?

I would like to profile the training loop of a transformer model written in Tensorflow on a multi-GPU system. Since the code doesn't support tf2, I cannot use the built-in but experimental profiler. Therefore, I would like to use nvprof + nvvp (CUDA…
gef
  • 11
  • 2
1
vote
2 answers

Issued load/store instructions for replay

There are two nvprof metrics regarding load/store instructions and they are ldst_executed and ldst_issued. We know that executed<=issued. I expect that those load/stores that are issued but not executed are related to branch predications and other…
mahmood
  • 23,197
  • 49
  • 147
  • 242
1
vote
0 answers

Monitor GPU performance with nvprof does not work

I am trying to use nvprof to monitor the performance of the GPU. I would like to know the time consuming of HtoD(host to device), DtoH(device to host) and device execution. It worked very well with a standard code from numba cuda website: from…
ZHANG Juenjie
  • 501
  • 5
  • 20
1
vote
1 answer

nvprof is crashing as it writes a very large file to /tmp/ and runs out of disk space

How do I work-around an nvprof crash that occurs when running on a disk with a relatively small amount of space available? Specifically, when profiling my cuda kernel, I use the following two commands: # Generate the timeline nvprof -f -o…
interestedparty333
  • 2,386
  • 1
  • 21
  • 35
1
vote
1 answer

local cache hit metric in cuda profiler

For some CUDA application profilings, I see that the value of local hit rate (local_hit_rate metric) is 0%. I want to distinguish the following concepts with that value. The application has no access to the local cache. All accesses to local cache…
mahmood
  • 23,197
  • 49
  • 147
  • 242
1
vote
2 answers

FLOP efficiency in CUDA

According to the definition of flop_sp_efficiency Ratio of achieved to peak single-precision floating-point operations The CUDA manual covers FLOPS, here. The metric yields ratio, e.g. 10%. That raises two questions about the term "peak": 1- Is…
mahmood
  • 23,197
  • 49
  • 147
  • 242
1
vote
1 answer

Performance Analysis of Multiple Kernels (CUDA C)

I have CUDA program with multiple kernels run on series (in the same stream- the default one). I want to make performance analysis for the program as a whole specifically the GPU portion. I'm doing the analysis using some metrics such as…
1
vote
1 answer

How do I apply nvprof to Kinetica?

Can someone pls gimme a hint on how to apply nvprof to Kinetica ? 1) I see the name of processes of Kinetica which sits upon GPUs is gpudb_cluster_cuda, and its parent process is gpudb_host_manager. I find gpudb_host_manager is started by…
nasica88
  • 1,185
  • 10
  • 10
1
vote
1 answer

How to specify nvprof "devices" option for Nvidia Visual Profiler?

CUDA Toolkit 9.0, Windows 10, GTX 1060 & NVS 315, 385.54 Driver version. Nvidia Visual Profiler always fails to profile, returning the following two warning messages: "Warning: This version of nvprof doesn't support the underlying device, GPU…
Tyson Hilmer
  • 741
  • 7
  • 25
1
vote
1 answer

How do I know the presence of nvprof inside CUDA program?

I have a small CUDA program that I want to profile with nvprof. The problem is that I want to write the program in such a way that When I run nvprof my_prog, it will invoke cudaProfilerStart and cudaProfilerStop. When I run my_prog, it will not…
Bojian Zheng
  • 2,167
  • 3
  • 13
  • 17
1
vote
1 answer

Filtering functions on NVIDIA Visual Profiler

I'm having trouble to isolate key parts of my code on NVIDIA Visual Profiler's timeline. Some huge bars, as the one in the image. I'm not interested in optimizing this function and its existence in the timeline is disrupting several statistical…
Pedro Alves
  • 1,667
  • 4
  • 17
  • 37
1
vote
1 answer

Checking currently residing entities in GPU memory

What would be the easiest way of checking which (and their size) entities that have been allocated with cudaMalloc (), reside currently on a GPU device? I want to find a memory leak inside a function, that if it's just called once and exit, there is…
nabber
  • 59
  • 8