4

I have a sample code that illustrates the issue:

import numpy as np
from numba import cuda, types
import configs


def main():
    arr = np.empty(0, dtype=np.uint8)

    stream = cuda.stream()
    d_arr = cuda.to_device(arr, stream=stream)
    kernel[configs.BLOCK_COUNT, configs.THREAD_COUNT, stream](d_arr)


@cuda.jit(types.void(
    types.Array(types.uint8, 1, 'C'),
), debug=configs.CUDA_DEBUG)
def kernel(d_arr):
    arr = cuda.const.array_like(d_arr)


if __name__ == "__main__":
    main()

When I run this code with cuda-memcheck, I get:

numba.errors.ConstantInferenceError: Failed in nopython mode pipeline (step: nopython rewrites)
Constant inference not possible for: arg(0, name=d_arr)

Which seems to indicate that array I passed in was not a constant so it could not be copied to constant memory - is that the case? If so, how can I copy to constant memory an array that was given to a kernel as input?

Edy Bourne
  • 5,679
  • 13
  • 53
  • 101

1 Answers1

3

You don't copy to constant array using an array that was given to the kernel as input. That type of input array is already in the device, and device code cannot write to constant memory.

Constant memory can only be written to from host code, and the constant syntax expects the array to be a host array.

Here is an example:

$ cat t32.py
import numpy as np
from numba import cuda, types, int32, int64

a = np.ones(3,dtype=np.int32)
@cuda.jit
def generate_mutants(b):
    c_a = cuda.const.array_like(a)
    b[0] = c_a[0]

if __name__ == "__main__":
    b = np.zeros(3,dtype=np.int32)
    generate_mutants[1, 1](b)
    print(b)
$ python t32.py
[1 0 0]
$

Note that the implementation of constant memory in Numba CUDA has some behavioral differences compared to what is possible with CUDA C/C++, this issue highlights some of them.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • 1
    Just a general question.. in the code above, toy seem to copy the array `a` into constant memory from the device code, no? – Edy Bourne Aug 08 '20 at 04:09
  • 2
    Yes, it looks that way. Numba has kind of a weird syntax/implementation in this respect, and it makes it appear as if the constant memory is local to the kernel. Ordinary CUDA C++ constant memory syntactically doesn't look this way and doesn't work this way. It is defined *outside* the kernel and it is at *global* scope. Numba implementation is such that it is defined *in* the kernel and effectively has local scope to that kernel. The issue I linked covers these differences. But under the hood, numba is using CUDA constant memory which is actually defined outside the kernel. – Robert Crovella Aug 08 '20 at 04:12