1

The problem I'm encountering is writing code such that the built-in features of Matlab's GPU programming will correctly divide data for parallel execution. Specifically, I'm sending N 'particle' images to the GPU's memory, organized in a 3-d array with the third dimension representing each image, and attempting to compare each of the N images with one single image that represents the target, also in the GPU memory.

My current implementation, really more or less how I'd like to see it implemented, is with one line of code:

particle_ifft = ifft2(particle_fft.*target_fft);

Note this is after taking the fft of each of the uploaded images. Herein lies the indexing problem: This statement requires equally sized "particle_fft" and "target_fft" matrices to use the '.*' operator. It would be inefficient in terms of memory usage to have multiple copies of the same target image for the sake of comparing with each particle image. I have used this inefficient method to get good performance results but it significantly affects the number of particle images I can upload to the GPU.

Is there a way that I can tell matlab to compare each 2d element of the particle images 3d array (each image) with only the single target image?

I have tried using a for loop to index into the 3d array and access each of the particle images individually for comparison with the single target but Matlab does not parallelize this type of operation on the GPU, i.e. it runs nearly 1000 times slower than equivalent code using the memory inefficient target array.

I realize I could write my own kernel that would solve this indexing problem but I'm interested in finding a way to leverage matlab's existing capabilities (specifically to not rewrite the fft2 and ifft2 functions). Ideas?

ejmunson
  • 11
  • 1

1 Answers1

0

In Parallel Computing Toolbox release R2012a, bsxfun was added - I think that's what you need, i.e.

bsxfun(@times, particle_fft, target_fft);

See: http://www.mathworks.co.uk/help/toolbox/distcomp/bsxfun.html

Edric
  • 23,676
  • 2
  • 38
  • 40