1

I'm writing a code for real-time processing of an image from a camera. I am using Python 3.5 with Anaconda Accelerate/Numba packages to perform most of the calculations on the GPU. I have problems with implementing a function which will find the position of a largest element in a float32 2d array. The array is already in the GPU memory. The problem is: it is terribly slow. It is the bottleneck of my whole code. The code:

@n_cuda.jit('void(float32[:,:], float32, float32, float32)')
def d_findcarpeak(temp_mat, height, width, peak_flat):
    row, col = cuda.grid(2)
    if row < height and col < width:
        peak_flat = temp_mat.argmax()

Here is where I call it:

d_findcarpeak[number_of_blocks, threads_per_block](
            d_temp_mat, height, width, d_peak_flat)

How can I rewrite this code?

  • What is the type of `temp_mat`? numpy array? numpy matrix? Something else? – Ohad Eytan Jul 28 '16 at 09:12
  • I edited the question. Moment after posting the question I understood why is the index always the same. However, the function is still incredibly slow. The temp_mat is created in the following way: `temp_mat = np.zeros_like(hologram, dtype=np.float32) d_temp_mat = n_cuda.to_device(temp_mat)` – pewter_cauldron Jul 28 '16 at 09:27

0 Answers0