Highest Voted 'nvprof' Questions

2

votes

0 answers

How to interpret nvprof results of cuda bandwidth-limited kernel?

I am running some GPU benchmarks to understand how to maximize the memory bandwidth from/to the global memory. I have an array of 128 MB (32*1024*1024 single-precision floating point numbers) aligned to a 128 bytes margin with three halo values…

asked Jul 28 '15 at 10:27

Spiros

2,156
2
23
42

1

vote

1 answer

Roofline Model with CUDA Manual vs. Nsight Compute

I have a very simple vector addition kernel written for CUDA. I want to calculate the arithmetic intensity as well as GFLOP/s for this Kernel. The values I calculate differ visibly from the values obtained by Nsight Compute's Roofline Analysis…

cuda nsight nvprof nsight-compute roofline

asked Jul 12 '23 at 19:10

Cherry Toska

131
8

1

vote

1 answer

Profilers (nvvp and nvprof) not showing "Page Fault" information

I am profiling a test code presented in the Unified Memory for CUDA Beginners on NVIDIA's developer forum. Code: #include #include // CUDA kernel to add elements of two arrays __global__ void add(int n, float* x, float* y) { …

windows cuda nvprof nvvp

asked Nov 29 '21 at 12:12

skm

5,015
8
43
104

1

vote

1 answer

How to capture GPU data when profiling Tensorflow code with nvprof?

I would like to profile the training loop of a transformer model written in Tensorflow on a multi-GPU system. Since the code doesn't support tf2, I cannot use the built-in but experimental profiler. Therefore, I would like to use nvprof + nvvp (CUDA…

python tensorflow profiling nvidia nvprof

asked Mar 25 '20 at 16:45

gef

11
2

1

vote

2 answers

Issued load/store instructions for replay

There are two nvprof metrics regarding load/store instructions and they are ldst_executed and ldst_issued. We know that executed<=issued. I expect that those load/stores that are issued but not executed are related to branch predications and other…

cuda nvidia nvprof

asked Aug 14 '19 at 09:56

mahmood

23,197
49
147
242

1

vote

0 answers

Monitor GPU performance with nvprof does not work

I am trying to use nvprof to monitor the performance of the GPU. I would like to know the time consuming of HtoD(host to device), DtoH(device to host) and device execution. It worked very well with a standard code from numba cuda website: from…

nvprof

asked Jul 10 '19 at 13:21

ZHANG Juenjie

501
5
20

1

vote

1 answer

nvprof is crashing as it writes a very large file to /tmp/ and runs out of disk space

How do I work-around an nvprof crash that occurs when running on a disk with a relatively small amount of space available? Specifically, when profiling my cuda kernel, I use the following two commands: # Generate the timeline nvprof -f -o…

cuda nvprof

asked May 31 '19 at 18:50

interestedparty333

2,386
1
21
35

1

vote

1 answer

local cache hit metric in cuda profiler

For some CUDA application profilings, I see that the value of local hit rate (local_hit_rate metric) is 0%. I want to distinguish the following concepts with that value. The application has no access to the local cache. All accesses to local cache…

cuda nvprof

asked Apr 17 '19 at 20:09

mahmood

23,197
49
147
242

1

vote

2 answers

FLOP efficiency in CUDA

According to the definition of flop_sp_efficiency Ratio of achieved to peak single-precision floating-point operations The CUDA manual covers FLOPS, here. The metric yields ratio, e.g. 10%. That raises two questions about the term "peak": 1- Is…

cuda nvprof

asked Apr 11 '19 at 17:34

mahmood

23,197
49
147
242

1

vote

1 answer

Performance Analysis of Multiple Kernels (CUDA C)

I have CUDA program with multiple kernels run on series (in the same stream- the default one). I want to make performance analysis for the program as a whole specifically the GPU portion. I'm doing the analysis using some metrics such as…

performance cuda nvprof

asked Nov 06 '18 at 21:18

Sarah Hamed

11
1

1

vote

1 answer

How do I apply nvprof to Kinetica?

Can someone pls gimme a hint on how to apply nvprof to Kinetica ? 1) I see the name of processes of Kinetica which sits upon GPUs is gpudb_cluster_cuda, and its parent process is gpudb_host_manager. I find gpudb_host_manager is started by…

nvprof kinetica

asked Oct 19 '18 at 10:46

nasica88

1,185
10
10

1

vote

1 answer

How to specify nvprof "devices" option for Nvidia Visual Profiler?

CUDA Toolkit 9.0, Windows 10, GTX 1060 & NVS 315, 385.54 Driver version. Nvidia Visual Profiler always fails to profile, returning the following two warning messages: "Warning: This version of nvprof doesn't support the underlying device, GPU…

cuda nvprof nvvp

asked Apr 17 '18 at 12:19

Tyson Hilmer

741
7
25

1

vote

1 answer

How do I know the presence of nvprof inside CUDA program?

I have a small CUDA program that I want to profile with nvprof. The problem is that I want to write the program in such a way that When I run nvprof my_prog, it will invoke cudaProfilerStart and cudaProfilerStop. When I run my_prog, it will not…

cuda nvprof

asked Oct 29 '17 at 19:17

Bojian Zheng

2,167
3
13
17

1

vote

1 answer

Filtering functions on NVIDIA Visual Profiler

I'm having trouble to isolate key parts of my code on NVIDIA Visual Profiler's timeline. Some huge bars, as the one in the image. I'm not interested in optimizing this function and its existence in the timeline is disrupting several statistical…

cuda profiler nvprof

asked Oct 03 '17 at 20:24

Pedro Alves

1,667
4
17
37

1

vote

1 answer

Checking currently residing entities in GPU memory

What would be the easiest way of checking which (and their size) entities that have been allocated with cudaMalloc (), reside currently on a GPU device? I want to find a memory leak inside a function, that if it's just called once and exit, there is…

cuda cuda-gdb nvprof

asked Oct 27 '16 at 08:30

nabber

59
8

Questions tagged [nvprof]