Questions tagged [gpu-constant-memory]

A device-global memory space on a GPU which caches constant data for all cores to read (and not write).

"Shared Memory" is one of the memory spaces in a GPU's memory model.

(Caveat: The following information may be specific to NVIDIA GPUs; please correct as necessary)

Memory in this space is shared by all computational cores within the GPU chip. Each processing core does, however, has a specialized cache for constants - separate from the read-write L1 cache and the shared memory. In a sense, one can think of constant memory as an extra area of fast cache, limited to use for constant values.

The size of constant memory is very limited: on nVIDIA Maxwell and Pascal microarchitecture GPUs, it is 64 KiB for the whole device; and the per-core cache for constants is sized at 10 KiB only.

As its name indicates, constant memory is not altered during kernel execution. It does, however, have to be initialized somehow... in CUDA, this is done using the cudaMemcpyToSymbol() function.

34 questions
24
votes
1 answer

CUDA Constant Memory Best Practices

I present here some code __constant__ int array[1024]; __global__ void kernel1(int *d_dst) { int tId = threadIdx.x + blockIdx.x * blockDim.x; d_dst[tId] = array[tId]; } __global__ void kernel2(int *d_dst, int *d_src) { int tId =…
Psypher
  • 396
  • 1
  • 3
  • 13
19
votes
1 answer

Interpreting the verbose output of ptxas, part I

I am trying to understand resource usage for each of my CUDA threads for a hand-written kernel. I compiled my kernel.cu file to a kernel.o file with nvcc -arch=sm_20 -ptxas-options=-v and I got the following output (passed through c++filt): ptxas…
curiousexplorer
  • 1,217
  • 1
  • 17
  • 24
14
votes
1 answer

How CUDA constant memory allocation works?

I'd like to get some insight about how constant memory is allocated (using CUDA 4.2). I know that the total available constant memory is 64KB. But when is this memory actually allocated on the device? Is this limit apply to each kernel, cuda context…
hthms
  • 853
  • 1
  • 10
  • 25
14
votes
2 answers

Error in cudaMemcpyToSymbol using CUDA 5

The Problem I have prepared one sample CUDA code using the constant memory. I can run this in cuda 4.2 successfully but I get "invalid device symbol" when I compile using the CUDA 5. I have attached the sample code here. The Code #include…
ran_pal
  • 143
  • 1
  • 5
14
votes
1 answer

Why is the constant memory size limited in CUDA?

According to "CUDA C Programming Guide", a constant memory access benefits only if a multiprocessor constant cache is hit (Section 5.3.2.4)1. Otherwise there can be even more memory requests for a half-warp than in case of the coalesced global…
AdelNick
  • 982
  • 1
  • 8
  • 17
8
votes
2 answers

CUDA constant memory banks

When we check the register usage by using xptxas we see something like this: ptxas info : Used 63 registers, 244 bytes cmem[0], 51220 bytes cmem[2], 24 bytes cmem[14], 20 bytes cmem[16] I wonder if currently there is any documentation that clearly…
biubiuty
  • 493
  • 6
  • 17
7
votes
2 answers

Constant memory usage in CUDA code

I can not figure it out myself, what is the best way to ensure the memory used in my kernel is constant. There is a similar question at http://stackoverflow...r-pleasant-way. I am working with GTX580 and compiling only for 2.0 capability. My kernel…
markhor
  • 2,235
  • 21
  • 18
6
votes
3 answers

Allocate constant memory

I'm trying to set my simulation params in constant memory but without luck (CUDA.NET). cudaMemcpyToSymbol function returns cudaErrorInvalidSymbol. The first parameter in cudaMemcpyToSymbol is string... Is it symbol name? actualy I don't understand…
Vladimir
  • 131
  • 2
  • 10
4
votes
1 answer

In Numba, how to copy an array into constant memory when targeting CUDA?

I have a sample code that illustrates the issue: import numpy as np from numba import cuda, types import configs def main(): arr = np.empty(0, dtype=np.uint8) stream = cuda.stream() d_arr = cuda.to_device(arr, stream=stream) …
Edy Bourne
  • 5,679
  • 13
  • 53
  • 101
4
votes
1 answer

cuda constant memory reference

I have an array in constant memory (it is a global variable) and obtained the reference to it by function call cudaGetSymbolAddress. My kernel runs slowly when I use this reference to fetch the constant data rather than using the global variable.…
3
votes
1 answer

When passing parameter by value to kernel function, where are parameters copied?

I'm beginner at CUDA programming and have a question. When I pass parameters by value, like this: __global__ void add(int a, int b, int *c) { // some operations } Since variable a and b are passed to kernel function add as copied value in…
Hongmin Yang
  • 71
  • 2
  • 7
3
votes
1 answer

What are the lifetimes for CUDA constant memory?

I'm having trouble wrapping my head around the restrictions on CUDA constant memory. Why can't we allocate __constant__ memory at runtime? Why do I need to compile in a fixed size variable with near-global scope? When is constant memory actually…
Mikhail
  • 7,749
  • 11
  • 62
  • 136
3
votes
1 answer

Local, global, constant & shared memory

I read some CUDA documentation that refers to local memory. (It is mostly the early documentation.) The device-properties reports a local-mem size (per thread). What does 'local' memory mean? What is 'local' memory? Where is 'local' memory? …
Doug
  • 2,783
  • 6
  • 33
  • 37
2
votes
3 answers

NVIDIA __constant memory: how to populate constant memory from host in both OpenCL and CUDA?

I have a buffer (array) on the host that should be resided in the constant memory region of the device (in this case, an NVIDIA GPU). So, I have two questions: How can I allocate a chunk of constant memory? Given the fact that I am tracing the…
mgNobody
  • 738
  • 7
  • 23
2
votes
1 answer

When should texture memory be prefered over constant memory?

Does the use of data storage in constant memory provides any benefit over texture in the Pascal architecture if the data request frequency is very high among threads (every thread pick at least one data from a specific column)? EDIT: This is a split…
Thiago Conrado
  • 726
  • 8
  • 15
1
2 3