I have a code which relies heavily on convolution. it is responsible for more than 80% of time running time. I want to use the gpu to make it much faster, but there are some things I don't completely understand, (I don't have the access to test this for my self yet)
If I pass information in the constructor of a class (inheriting from handle) to be stored on the GPU memory (with
gpuArray
), will it remain there? will I have a problem passing the class as a parameter to functions? The operations performed on the data itself can all be done on the GPU (and I'm guessing looping on an array works just as well no matter where the array is stored)I have a matrix
size(MyMat)=[s, s, b, n]
, in which I want to store n different matrices of size[s, s, b]
computed simultaneously (with operations which can be done on the GPU) Do I have to use parfor? (I understand the overhead makes it a bad idea in most cases) or is there a faster way to get the GPU to do this fast? the only computation I need to perform in this case is convolution (but It can't all be done in a single convn operation)
Thank you!