Questions tagged [bank-conflict]

a latency problem due to multi-threaded access to a shared memory system. At present, this latency issue is most common in nVidia and ATI graphics cards.

nVidia and ATI graphics cards with shared memory experience bank-conflicts when multiple threads attempt to access (out-of-order) a common bank of memory. For more information, please see the Stack Overflow question:

What is a bank conflict? (Doing Cuda/OpenCL programming)

41 questions

votes

1 answer

shared memory bank conflict with char array

I understand the bank conflict when dealing with 4-byte data types, but I wonder if we get any bank conflict (4-way/8-way?) with the following code __shared__ char shared[]; foo = shared[threadIdx.x]; The above code leads to 4 consecutive threads…

asked Feb 11 '16 at 00:46

Karl

votes

1 answer

CUDA bank conflict for L1 cache?

On NVIDIA's 2.x architecture, each warp has 64kb of memory that is by default partitioned into 48kb of Shared Memory and 16kb of L1 cache (servicing global and constant memory). We all know about the bank conflicts of accessing Shared Memory - the…

cuda opencl gpu nvidia bank-conflict

asked Feb 21 '13 at 16:47

cmo

3,762
4
36
64

vote

0 answers

Understanding the Reduction in Bank Conflicts in CUDA Kernels

I'm working with different CUDA kernels (gemm3, gemm4, and gemm5) for matrix multiplication: gemm3: baseline of shared memory GEMM gemm4: less thread blocks in x dimension gemm5: less blocks in both x and y dimension After profiling, I noticed…

cuda gpu-shared-memory bank-conflict

asked Aug 20 '23 at 16:54

Worldbuffer

vote

0 answers

Still bank conflict after shared memory padding

As the trick described in here, I tested the following code and got the corresponding profiling result. Conflicts were notably diminished, but some still persist. // store conflict __global__ void setRowReadCol(int *out){ __shared__ int…

cuda gpu-shared-memory bank-conflict

asked Aug 13 '23 at 15:32

picklesmithy129

vote

1 answer

Reading Shared/Local Memory Store/Load bank conflicts hardware counters for OpenCL executable under Nvidia

It is possible to use nvprof to access/read bank conflicts counters for CUDA exec: nvprof --events shared_st_bank_conflict,shared_ld_bank_conflict my_cuda_exe However it does not work for the code that uses OpenCL rather then CUDA code. Is there…

opencl nvidia performancecounter gpu-shared-memory bank-conflict

asked Oct 18 '20 at 07:16

Artyom

31,019
21
127
215

vote

1 answer

OpenCL bank conflict - dropping memory / corrupting data?

I apologize in advance for the vagueness of this question. Background: I am attempting to write a morphological image processing function in OpenCL. I have a __local buffer which I use to store data for every pixel (each pixel is represented by a…

opencl bank-conflict

asked Feb 17 '11 at 04:41

Reefpoints

vote

1 answer

CUDA: overloading of shared memory to implement reduction approach with multiple arrays

I have 5 large size arrays A(N*5), B(N*5), C(N*5), D(N*5), E(N*2) number 5 and 2 represents the components of these variables in different planes/axes. That's why I have structured arrays in this fashion so I can visualize the data when I am writing…

c++ cuda reduction bank-conflict gpu-shared-memory

asked Dec 19 '17 at 22:18

user2415927

vote

1 answer

GPU shared memory practical example

I have an array like this: data[16] = {10,1,8,-1,0,-2,3,5,-2,-3,2,7,0,11,0,2} I want to compute the reduction of this array using shared memory on a G80 GPU. The kernel as cited in the NVIDIA document is like that: __global__ void reduce1(int…

cuda gpu-shared-memory bank-conflict

asked Apr 03 '17 at 23:25

sara idrissi

vote

1 answer

Shared memory bank conflict in CUDA Fortran when loading 2D data from global memory

I am accessing global memory to load data to shared memory and would like to know if there is a bank conflict. Here is the setup: In global memory: g_array. A 2D matrix of size (256, 64) This is how I load the array data from global memory to…

cuda gpu-shared-memory bank-conflict

asked Sep 07 '13 at 01:52

Adjeiinfo

vote

1 answer

Shared memory configuration for prefetching

In my program I use shared memory to do prefetching of data. A 2D block of threads, dimentions 8 by 4 (32), gets 8 * 4 * 8 * sizeof(float4) bytes of shared memory. Each thread copies 8 float4s in a loop: inline __device__ void pack(const float4…

cuda nvidia gpu-shared-memory bank-conflict

asked Feb 27 '13 at 10:11

Dori

vote

1 answer

What's the mechanism of the warps and the banks in CUDA?

I'm a rookie in learning CUDA parallel programming. Now I'm confused in the global memory access of device. It's about the warp model and coalescence. There are some points: It's said that threads in one block are split into warps. In each warp…

cuda bank-conflict

asked Feb 16 '13 at 10:23

Han

vote

1 answer

Bank conflicts in 2.x devices

What is a bank conflict in devices with 2.x devices? As I understand the CUDA C programming guide, in 2.x devices, if two threads access the same 32 bit word in the same shared memory bank, it does not cause a bank conflict. Instead, the word is…

cuda gpu gpu-shared-memory bank-conflict

asked Jun 30 '12 at 16:24

gmemon

2,573
5
32
37

vote

1 answer

Does reading an int array from shared memory preclude bank conflicts?

I am designing a CUDA kernel that will be launched with 16 threads per thread block. I have an array of N ints in shared memory (i.e. per thread block) that I wish to process. If the access pattern of the threads is consecutive into the array then…

cuda micro-optimization bank-conflict gpu-shared-memory

asked Jun 08 '12 at 17:13

twerdster

4,977
3
40
70

votes

2 answers

Bank-Conflict-Free Access in shared memory

I have to use shared memory that is 64 elements in size, twice the number of banks and threads in a warp. How should I address them to yield a bank-conflict-free access?

cuda gpu gpu-shared-memory memory-access bank-conflict

asked Mar 31 '12 at 14:28

Behzad Baghapour

votes

1 answer

CUDA shared memory bank conflict unexpected timing

I was trying to reproduce a bank conflict scenario (minimal working example here) and decided to perform a benchmark when a warp (32 threads) access 32 integers of size 32-bits each in the following 2 scenarios: When there is no bank conflict…

cuda gpu-shared-memory bank-conflict

asked Nov 20 '22 at 23:45

Ferdinand Mom

Prev 1

3 Next