Questions tagged [gpu-warp]

A warp or wavefront is a logical unit in GPU kernel scheduling - the largest set of threads within the grid which are (logically) instruction-locked and always synchronized with each other..

Some references:

Warps or wavefronts of GPU threads
Why bother knowing about GPU warps? here on the site
What is a warp in CUDA?
Understanding the CUDA parallel threading model at the Portland Group's PGI insider blog.

40 questions

votes

0 answers

Warp scheduling in Kepler GPU

I recently read the GK110 white paper, which claims that each SM has 4 warp schedulers, and each with dual Instruction Dispatch Units. On each cycle, each warp scheduler selects an eligible warp to execute instructions for it. My question is in…

cuda kepler gpu-warp

asked May 08 '18 at 07:32

StrikeW

votes

2 answers

CUDA Reduction: Warp Unrolling (School)

I am currently working on a project in which I am unrolling the last warp of a reduction. I have finished the code above; however, some modifications were done by guessing and I'd like an explanation why. The code I have written is only the function…

cuda volatile reduction gpu-warp

asked Mar 08 '18 at 00:24

Michael Choi

votes

1 answer

Thread/warp local lock in cuda

I want to implement critical sections in cuda. I read many questions and answers on this subject, and answers often involve atomicCAS and atomicExch. However, this doesn't work at warp level, since all threads in the warp acquire the same lock after…

cuda critical-section gpu-warp

asked Aug 16 '17 at 16:31

Regis Portalez

4,675
1
29
41

votes

1 answer

Is there a way to explicitly map a thread to a specific warp in CUDA?

Say, dynamic analysis was done on a CUDA program such that certain threads were better off being in the same warp. For example, let's pretend we have 1024 cuda threads and a warp size of 32. After dynamic analysis we find out that threads 989, 243,…

cuda gpu gpgpu gpu-warp warp-scheduler

asked Mar 24 '17 at 22:11

xfern

votes

1 answer

Avoid warp divergence

I have boolean 1D array T[N] controlling the value of shifts as follows: **a: an array of pointers to n*n matrices in global memory I want for each matrix a to substruct a shift*Identity to obtain: a=a-shift*eye(n) I have: __device__ bool…

c cuda gpu-warp

asked Sep 14 '15 at 11:24

Sinem

votes

1 answer

Branch based on the WARP ID

Is there any way to find the WARP id of a thread in CUDA? I want to perform a branch based on the WARP id.

cuda gpu-warp

asked Jul 23 '15 at 17:58

AmirC

votes

1 answer

What is warp-level-programming (racecheck)

In the online racecheck documentation, the severity level has this description of hazard level WARNING: An example of this are hazards due to warp level programming that make the assumption that threads are proceeding in groups. The statement is…

cuda gpu-warp

asked Sep 25 '13 at 17:49

Doug

2,783
6
33
37

-1

votes

1 answer

CUDA kernel with single branch runs 1.5x faster than kernel without branch

I've got a strange performance inversion on filter kernel with and without branching. Kernel with branching runs ~1.5x faster than the kernel without branching. Basically I need to sort a bunch of radiance rays then apply interaction kernels. Since…

performance cuda gpu-warp

asked Sep 20 '17 at 22:22

Stepan Tezyunichev

-2

votes

1 answer

Control Divergence with simple matrix multiplication kernel

Given the following simple matrix multiplication kernel `__global__ void MatrixMulKernel(float* M, float* N, float* P, int Width) { int Row = blockIdx.y*blockDim.y+threadIdx.y; int Col = blockIdx.x*blockDim.x+threadIdx.x; if ((Row < Width) &&…

parallel-processing cuda computer-science gpu gpu-warp

asked Jun 02 '17 at 18:23

Harry Willams

-3

votes

1 answer

CUDA Warp Divergence

I' m developing with cuda and have an arithmetic problem, which I could implement with or without warp diverengence. With warp divergence it would look like: float v1; float v2; //calculate values of v1 and v2 if(v2 != 0) v1 +=…

cuda gpu-warp

asked Aug 19 '15 at 16:55

Melenor

Prev 1 2