nvprof is a command-line profiler that enables you to collect and view CPU and GPU timers and events in CUDA programs.
Questions tagged [nvprof]
89 questions
3
votes
1 answer
What is a transaction and a request in the 'gld_transactions_per_request' metric of the Cuda profiler?
For a perfectly coalesced accesses to an array of 4096 doubles, each 8 bytes, nvprof reports the following metrics on a Nvidia Tesla V100:
global_load_requests: 128
gld_transactions: 1024
gld_transactions_per_request: 8.000000
I cannot find a…

anroesti
- 11,053
- 3
- 22
- 33
3
votes
2 answers
get the execution time in nvprof
Is there a way to get the kernel execution time in nvprof like for a metric?
for example, to get the dram read transactions I type:
nvprof --metrics dram_read_transactions ./myprogram
My question is: is there something like
nvprof --metrics…

user352102
- 199
- 2
- 9
3
votes
2 answers
Numba and guvectorize for CUDA target: Code running slower than expected
Notable details
Large datasets (10 million x 5), (200 x 10 million x 5)
Numpy mostly
Takes longer after every run
Using Spyder3
Windows 10
First thing is attempting to use guvectorize with the following function. I am passing in a bunch of numpy…

Bryce Booze
- 165
- 1
- 11
3
votes
0 answers
What does a slice mean in cuda?
I'm a new on cuda programming.
I have to GPU profiling using the nvprof about my application.
I find a metric l2_subp0_write_sector_misses that means number of write requests sent to DRAM from slice 0 of L2 cache.
But I don't know what does a slice…

kh.chung
- 53
- 1
- 4
3
votes
1 answer
What exactly are the transaction metrics reported by NVPROF?
I'm trying to figure out what exactly each of the metrics reported by "nvprof" are. More specifically I can't figure out which transactions are System Memory and Device Memory read and writes. I wrote a very basic code just to help figure this…

B.Md
- 107
- 2
- 10
2
votes
1 answer
nvprof --metrics works with c++ executable but not with fortran executable
I am trying to learn CUDA and I am now stuck at running a simple nvprof command.
I am testing a simple script in both C++ and Fortran using CUDA. The CUDA kernels test two different ways of performing a simple task with the intent to show the…

Principio Tudisco
- 65
- 4
2
votes
0 answers
CUDA nvprof on Windows: "Warning: unable to locate profiling library, GPU profiling skipped" (NOT cupti64_102.dll)
I am trying to use nvprof on a cuda/c++ program, but I get the output:
======== Warning: unable to locate profiling library, GPU profiling skipped
... my output ...
======== Warning: No CUDA application was profiled, exiting
My command:
nvprof.exe…

nonsence90
- 21
- 4
2
votes
1 answer
Running nvprof --metrics command under windows gives an error:cuda profiling error
Running nvprof --metrics command under windows gives an error:
==6580== NVPROF is profiling process 6580, command: Project1.exe
==6580== Error: Internal profiling error 4292:1.
======== Error: CUDA profiling error.
error1
If I only use the nvprof…

bourbon
- 73
- 7
2
votes
1 answer
Why nvprof does not have metrics on floating point division operations?
Using nvprof to measure floating point operations of my sample kernels, it seems that there is no metrics for flop_count_dp_div, and the actual double-precision division operations is measured in terms of add/mul/fma of double-precision and even…

bruin
- 979
- 1
- 10
- 30
2
votes
1 answer
Where is the boundary of start and end of CPU launch and GPU launch of Nvidia Profiling NVPROF?
What is the definition of start and end of kernel launch in the CPU and GPU (yellow block)? Where is the boundary between them?
Please notice that the start, end, and duration of those yellow blocks in CPU and GPU are different.Why CPU invocation…

skytree
- 1,060
- 2
- 13
- 38
2
votes
0 answers
nvprof produces unexpected branch efficiency results
I followed the examples (the following codes) of warp divergence on the textbook "Professional CUDA C Programming".
__global__ void math_kernel1(float *c) {
int tid = blockIdx.x * blockDim.x + threadIdx.x;
float a, b;
a = b = 0.f;
if…

Kipsora Lawrence
- 85
- 9
2
votes
0 answers
nvprof shows error with TensorFlow
I am trying to run nvprof with cifar10_multigpu_train.py.
I am using following command
/home/ibm/tensorflow/third_party/gpus/cuda/bin/nvprof python cifar10_multi_gpu_train.py
It starts the application but after sometime it shows following errors…

Khayam Gondal
- 2,366
- 2
- 28
- 40
2
votes
2 answers
Unable to import nvprof generated profile data
I am trying to profile a TensorFlow based code using nvprof. I am using following command for this
nvprof python ass2.py
The program runs successfully but at the end it shows following error.
==49791== Profiling application: python…

Khayam Gondal
- 2,366
- 2
- 28
- 40
2
votes
1 answer
Data Size to Instructions per Warp relationship in CUDA
I tried to see the number of instructions executed in a kernel when the size of the data type changed
In order to get a custom sized data structure I created a struct as following,
#define DATABYTES 40
__host__ __device__
struct floatArray
{
…

BAdhi
- 420
- 7
- 19
2
votes
2 answers
nvprof not picking up any API calls or kernels
I'm trying to get some benchmark timings in my CUDA program with nvprof but unfortunately it doesn't seem to be profiling any API calls or kernels. I looked for a simple beginners example to make sure I was doing it right and found one on the…

theKunz
- 444
- 4
- 12