2

My program has two kernels and the second kernel should use the already uploaded input data and the results from the first kernel, so I can save the memory transfers. How would I archive this?

This is how I launch my kernels:

result = gpuarray.zeros(points, dtype=np.float32)  

kernel(
    driver.In(dataT),result,np.int32(points),
    grid = (blocks,1),
    block = (block_size, 1, 1),
)
Framester
  • 33,341
  • 51
  • 130
  • 192

1 Answers1

1

In pycuda you won't transfer data to and from the device unless you explicitly request it. For example, if you allocate memory and transfer some data to the GPU with:

result = float64(zeros( (height,width) )
result_device = gpuarray.to_gpu(result)

The variable result_device is a reference to the data in the GPU. You can pass result_device to any other kernel without incurring a memory transfer back to the CPU. In this case a memory transfer will happen again when you call:

result = result_device.get()
jkysam
  • 5,533
  • 1
  • 21
  • 16