Highest Voted 'nvprof' Questions

0

votes

1 answer

nvidia visual profiler Encountered invalid option : --openacc-profiling

Running a simple application on nvidia Visual Profiler shows the error: Encountered invalid option : --openacc-profiling ======== Use "nvprof --help" to get more information. Any gpu applicatiion I try to profile gets the same error. I tried to…

asked Apr 08 '17 at 16:01

Rodolfo

1,091
3
13
35

0

votes

1 answer

nvprof is using all available GPU's when profiling python script

I am using a remote machine, which has 2 GPU's, in order to execute a Python script which has CUDA code. In order to find where I can improve the performance of my code, I am trying to use nvprof. I have set on my code that I only want to use one…

python cuda profiling nvprof

asked Apr 06 '17 at 13:53

Filipe Aleixo

3,924
3
41
74

0

votes

1 answer

Using nvprof to Count CUDA Kernel Executions

Is it possible to use nvprof to count the number of CUDA kernel executions (ie how many kernels are launched)? Right now when I run nprof what I am seeing is: ==537== Profiling application: python tf.py ==537== Profiling result: Time(%) Time …

cuda nvprof

asked Mar 09 '17 at 21:38

Alex Rothberg

10,243
13
60
120

0

votes

2 answers

Is there some in-code profiling of CUDA program

In OpenCL world there is function clGetEventProfilingInfo which returns all profiling info of event like queued, submitted, start and end times in nanoseconds. It is quite convenient because I'm able to printf that info whenever I want. For example…

cuda profiling nvprof

asked Oct 30 '16 at 19:25

petRUShka

9,812
12
61
95

0

votes

1 answer

Profiling Result doesn't appear in event/metric summary mode nvprof

According to the documentation for event/summary mode of nvprof, the output looks like: ==6461== Profiling application: matrixMul ==6461== Profiling result: ==6461== Event result: //The outputs ==6461== Metric result: //The outputs The…

cuda profiling nvprof

asked Jun 23 '16 at 17:05

user3813674

2,553
2
15
26

0

votes

1 answer

Global load transaction count when in coalesced memory access

I've created a simple kernel to test the coalesced memory access by observing the transaction counts, in nvidia gtx980 card. The kernel is, __global__ void copy_coalesced(float * d_in, float * d_out) { int tid = threadIdx.x +…

cuda nvprof

asked Jun 15 '16 at 06:44

BAdhi

420
7
19

0

votes

1 answer

nvprof with MPICH

I am trying to profile an MPI/OpenACC Fortran code. I found a site that details how to run nvprof with MPI here. The examples given are for OpenMPI. However, I am limited to MPICH and I can't figure out the equivalent. Anyone know what it would…

fortran mpi openacc nvprof

asked Jun 09 '16 at 14:05

bob.sacamento

6,283
10
56
115

0

votes

1 answer

Is there any difference in the output of nvvp (visual) and nvprof (command line)?

To measure metrics/events for CUDA programs, I have tried using the command line like: nvprof --metrics <> I also measured the same metrics on the Visual profiler nvvp. I noticed no difference in the values I get. I noticed a…

cuda gpu nvidia nvvp nvprof

asked Jun 04 '16 at 07:09

Kajal

581
11
24

0

votes

1 answer

Where can i find thee missing formulas in latest Nvidia CUDA Profiler user guide

I found that in the previous version of profiler user guide, formula for the metrics are provided. For example, Metric Name: branch_efficiency Description: Ratio of non-divergent branches to total branches Formula: 100 * (branch -…

cuda gpgpu nvidia nvprof

asked Feb 08 '16 at 22:04

Steven Huang

153
1
13

0

votes

1 answer

What exactly does NVPROF Power Profile measure?

I have used NVPROF to get the power profile of a Kepler Architecture NVIDIA GPUs. My question is what exactly are we seeing? If I understand correctly there is a 12V and 3.3V rail feeding the GPU and the GPU can draw power from the PCI Bus. Is the…

profiling gpu nvidia nvprof

asked Aug 15 '15 at 03:45

travelingbones

7,919
6
36
43

0

votes

1 answer

My CUDA nvprof 'API Trace' and 'GPU Trace' are not synchronized - what to do?

I'm using the CUDA 7.0 profiler, nvprof, to profile some process making CUDA calls: $ nvprof -o out.nvprof /path/to/my/app Later, I generate two traces: the 'API trace' (what happens on the host CPU, e.g. CUDA runtime calls and ranges you mark) and…

cuda profiling trace data-synchronization nvprof

asked Apr 09 '15 at 20:35

einpoklum

118,144
57
340
684

-1

votes

1 answer

Why operations in two CUDA Streams are not overlapping?

My program is a pipeline, which contains multiple kernels and memcpys. Each task will go through the same pipeline with different input data. The host code will first chooses a Channel, an encapsulation of scratchpad memory and CUDA objects, when it…

cuda nvprof cuda-streams nvvp

asked Jan 15 '19 at 14:47

StrikeW

501
1
4
11

-1

votes

1 answer

How to print api calls per thread with nvprof

I am profiling a CUDA application and dumping the logs to a file say target.prof My application uses multiple threads to dispatch kernels and I want to observe the api calls from just one of those threads. I tried using nvprof -i target.prof…

cuda gpu nvidia nvprof nvvp

asked Sep 12 '18 at 05:28

Tapan Chugh

354
2
4

-1

votes

1 answer

CUDA logarithm: nvprof detects single precision operations in double precision

I'm computing "log(x)" in double precision in CUDA, but when I profile, it detects single precision operations using metric "flop_count_sp_special". I'm compiling with "-arch=sm_30" to ensure compute capability 3.0 and double precision arithmetic,…

cuda gpu nvidia nvprof

asked Aug 31 '18 at 16:51

Jesse Chan

168
9

Questions tagged [nvprof]