Highest Voted 'nvprof' Questions

3

votes

1 answer

What is a transaction and a request in the 'gld_transactions_per_request' metric of the Cuda profiler?

For a perfectly coalesced accesses to an array of 4096 doubles, each 8 bytes, nvprof reports the following metrics on a Nvidia Tesla V100: global_load_requests: 128 gld_transactions: 1024 gld_transactions_per_request: 8.000000 I cannot find a…

cuda nvprof

asked Mar 04 '20 at 22:50

anroesti

11,053
3
22
33

3

votes

2 answers

get the execution time in nvprof

Is there a way to get the kernel execution time in nvprof like for a metric? for example, to get the dram read transactions I type: nvprof --metrics dram_read_transactions ./myprogram My question is: is there something like nvprof --metrics…

cuda nvprof

asked Sep 24 '18 at 03:05

user352102

199
2
9

3

votes

2 answers

Numba and guvectorize for CUDA target: Code running slower than expected

Notable details Large datasets (10 million x 5), (200 x 10 million x 5) Numpy mostly Takes longer after every run Using Spyder3 Windows 10 First thing is attempting to use guvectorize with the following function. I am passing in a bunch of numpy…

python performance cuda numba nvprof

asked Aug 27 '18 at 19:49

Bryce Booze

165
1
11

3

votes

0 answers

What does a slice mean in cuda?

I'm a new on cuda programming. I have to GPU profiling using the nvprof about my application. I find a metric l2_subp0_write_sector_misses that means number of write requests sent to DRAM from slice 0 of L2 cache. But I don't know what does a slice…

cuda gpu nvprof cuda-events

asked Feb 24 '17 at 07:20

kh.chung

53
1
4

3

votes

1 answer

What exactly are the transaction metrics reported by NVPROF?

I'm trying to figure out what exactly each of the metrics reported by "nvprof" are. More specifically I can't figure out which transactions are System Memory and Device Memory read and writes. I wrote a very basic code just to help figure this…

memory cuda gpu profiler nvprof

asked Apr 20 '16 at 22:55

B.Md

107
2
10

2

votes

1 answer

nvprof --metrics works with c++ executable but not with fortran executable

I am trying to learn CUDA and I am now stuck at running a simple nvprof command. I am testing a simple script in both C++ and Fortran using CUDA. The CUDA kernels test two different ways of performing a simple task with the intent to show the…

c++ cuda fortran nvprof

asked Oct 26 '22 at 04:40

Principio Tudisco

65
4

2

votes

0 answers

CUDA nvprof on Windows: "Warning: unable to locate profiling library, GPU profiling skipped" (NOT cupti64_102.dll)

I am trying to use nvprof on a cuda/c++ program, but I get the output: ======== Warning: unable to locate profiling library, GPU profiling skipped ... my output ... ======== Warning: No CUDA application was profiled, exiting My command: nvprof.exe…

c++ cuda nvidia nvprof

asked Aug 18 '20 at 07:54

nonsence90

21
4

2

votes

1 answer

Running nvprof --metrics command under windows gives an error：cuda profiling error

Running nvprof --metrics command under windows gives an error： ==6580== NVPROF is profiling process 6580, command: Project1.exe ==6580== Error: Internal profiling error 4292:1. ======== Error: CUDA profiling error. error1 If I only use the nvprof…

cuda metrics nvprof

asked Mar 27 '20 at 07:09

bourbon

73
7

2

votes

1 answer

Why nvprof does not have metrics on floating point division operations?

Using nvprof to measure floating point operations of my sample kernels, it seems that there is no metrics for flop_count_dp_div, and the actual double-precision division operations is measured in terms of add/mul/fma of double-precision and even…

cuda floating-point nvprof

asked Aug 30 '19 at 02:18

bruin

979
1
10
30

2

votes

1 answer

Where is the boundary of start and end of CPU launch and GPU launch of Nvidia Profiling NVPROF?

What is the definition of start and end of kernel launch in the CPU and GPU (yellow block)? Where is the boundary between them? Please notice that the start, end, and duration of those yellow blocks in CPU and GPU are different.Why CPU invocation…

cuda gpu profiling nvprof nvvp

asked May 14 '19 at 22:54

skytree

1,060
2
13
38

2

votes

0 answers

nvprof produces unexpected branch efficiency results

I followed the examples (the following codes) of warp divergence on the textbook "Professional CUDA C Programming". __global__ void math_kernel1(float *c) { int tid = blockIdx.x * blockDim.x + threadIdx.x; float a, b; a = b = 0.f; if…

c++ cuda gpu gpgpu nvprof

asked Mar 22 '19 at 14:37

Kipsora Lawrence

85
9

2

votes

0 answers

nvprof shows error with TensorFlow

I am trying to run nvprof with cifar10_multigpu_train.py. I am using following command /home/ibm/tensorflow/third_party/gpus/cuda/bin/nvprof python cifar10_multi_gpu_train.py It starts the application but after sometime it shows following errors…

tensorflow nvprof nvvp

asked Feb 27 '17 at 19:46

Khayam Gondal

2,366
2
28
40

2

votes

2 answers

Unable to import nvprof generated profile data

I am trying to profile a TensorFlow based code using nvprof. I am using following command for this nvprof python ass2.py The program runs successfully but at the end it shows following error. ==49791== Profiling application: python…

python cuda tensorflow nvprof

asked Feb 08 '17 at 21:33

Khayam Gondal

2,366
2
28
40

2

votes

1 answer

Data Size to Instructions per Warp relationship in CUDA

I tried to see the number of instructions executed in a kernel when the size of the data type changed In order to get a custom sized data structure I created a struct as following, #define DATABYTES 40 __host__ __device__ struct floatArray { …

cuda nvprof

asked Jun 16 '16 at 14:06

BAdhi

420
7
19

2

votes

2 answers

nvprof not picking up any API calls or kernels

I'm trying to get some benchmark timings in my CUDA program with nvprof but unfortunately it doesn't seem to be profiling any API calls or kernels. I looked for a simple beginners example to make sure I was doing it right and found one on the…

c cuda profiling nvprof

asked May 01 '16 at 18:55

theKunz

444
4
12

Questions tagged [nvprof]