When invoking a CUDA kernel for a specific thread configuration, are there any strict rules on which memory space (device/host) kernel parameters should reside in and what type they should be?
Suppose I launch a 1-D grid of threads with
kernel<<<numblocks, threadsperblock >>> (/*parameters*/)
Can I pass an integer parameter int foo
which is a host-integer variable,
directly to the CUDA kernel? Or should I cudaMalloc
memory for a single integer say dev_foo
and then cudaMemcpy
foo
into devfoo
and then pass devfoo
as a kernel parameter?