0

I have a code which relies heavily on convolution. it is responsible for more than 80% of time running time. I want to use the gpu to make it much faster, but there are some things I don't completely understand, (I don't have the access to test this for my self yet)

  1. If I pass information in the constructor of a class (inheriting from handle) to be stored on the GPU memory (with gpuArray), will it remain there? will I have a problem passing the class as a parameter to functions? The operations performed on the data itself can all be done on the GPU (and I'm guessing looping on an array works just as well no matter where the array is stored)

  2. I have a matrix size(MyMat)=[s, s, b, n], in which I want to store n different matrices of size [s, s, b] computed simultaneously (with operations which can be done on the GPU) Do I have to use parfor? (I understand the overhead makes it a bad idea in most cases) or is there a faster way to get the GPU to do this fast? the only computation I need to perform in this case is convolution (but It can't all be done in a single convn operation)

Thank you!

talonmies
  • 70,661
  • 34
  • 192
  • 269
user1999728
  • 913
  • 1
  • 8
  • 25
  • Not exactly what you asked, but have you considered using fftfilt? This is convolution using the overlap add method with aid of the FFT algorithm and can be much faster than the direct convolution. BTW: This probably could be run on a GPU as well. – Andreas H. Jul 27 '13 at 23:00
  • I've asked another question to which this could be an answer. but a commenter there tested out FFT convolution and said it's slower for the matrix sizes i'm using. – user1999728 Jul 28 '13 at 04:56

1 Answers1

0

1) Just use the gpuArray like any other variable. It is likely that no adaptations of your code is necessary, otherwise you will receive an exception pointing you to the issue.

2) the combination of gpuArray and parfor is the typical way to parallelize gpu computation on multiple gpus? Do you have multiple gpus? If yes try parfor, if not it probably will be slower because only one worker can use the gpu and all others have to wait.

Daniel
  • 36,610
  • 3
  • 36
  • 69