Questions tagged [gpu-warp]

A warp or wavefront is a logical unit in GPU kernel scheduling - the largest set of threads within the grid which are (logically) instruction-locked and always synchronized with each other..

Some references:

40 questions
0
votes
0 answers

Warp scheduling in Kepler GPU

I recently read the GK110 white paper, which claims that each SM has 4 warp schedulers, and each with dual Instruction Dispatch Units. On each cycle, each warp scheduler selects an eligible warp to execute instructions for it. My question is in…
StrikeW
  • 501
  • 1
  • 4
  • 11
0
votes
2 answers

CUDA Reduction: Warp Unrolling (School)

I am currently working on a project in which I am unrolling the last warp of a reduction. I have finished the code above; however, some modifications were done by guessing and I'd like an explanation why. The code I have written is only the function…
Michael Choi
  • 610
  • 5
  • 22
0
votes
1 answer

Thread/warp local lock in cuda

I want to implement critical sections in cuda. I read many questions and answers on this subject, and answers often involve atomicCAS and atomicExch. However, this doesn't work at warp level, since all threads in the warp acquire the same lock after…
Regis Portalez
  • 4,675
  • 1
  • 29
  • 41
0
votes
1 answer

Is there a way to explicitly map a thread to a specific warp in CUDA?

Say, dynamic analysis was done on a CUDA program such that certain threads were better off being in the same warp. For example, let's pretend we have 1024 cuda threads and a warp size of 32. After dynamic analysis we find out that threads 989, 243,…
xfern
  • 96
  • 1
  • 7
0
votes
1 answer

Avoid warp divergence

I have boolean 1D array T[N] controlling the value of shifts as follows: **a: an array of pointers to n*n matrices in global memory I want for each matrix a to substruct a shift*Identity to obtain: a=a-shift*eye(n) I have: __device__ bool…
Sinem
  • 63
  • 1
  • 10
0
votes
1 answer

Branch based on the WARP ID

Is there any way to find the WARP id of a thread in CUDA? I want to perform a branch based on the WARP id.
AmirC
  • 326
  • 5
  • 14
0
votes
1 answer

What is warp-level-programming (racecheck)

In the online racecheck documentation, the severity level has this description of hazard level WARNING: An example of this are hazards due to warp level programming that make the assumption that threads are proceeding in groups. The statement is…
Doug
  • 2,783
  • 6
  • 33
  • 37
-1
votes
1 answer

CUDA kernel with single branch runs 1.5x faster than kernel without branch

I've got a strange performance inversion on filter kernel with and without branching. Kernel with branching runs ~1.5x faster than the kernel without branching. Basically I need to sort a bunch of radiance rays then apply interaction kernels. Since…
-2
votes
1 answer

Control Divergence with simple matrix multiplication kernel

Given the following simple matrix multiplication kernel `__global__ void MatrixMulKernel(float* M, float* N, float* P, int Width) { int Row = blockIdx.y*blockDim.y+threadIdx.y; int Col = blockIdx.x*blockDim.x+threadIdx.x; if ((Row < Width) &&…
-3
votes
1 answer

CUDA Warp Divergence

I' m developing with cuda and have an arithmetic problem, which I could implement with or without warp diverengence. With warp divergence it would look like: float v1; float v2; //calculate values of v1 and v2 if(v2 != 0) v1 +=…
Melenor
  • 40
  • 1
1 2
3