ArrayFire Memcpy

Question

I have a question related with the ArrayFire library and the use of memory. I implemented some program in plain CUDA/C , and the same programm using ArrayFire, and the CUDA/C program is much faster ( like 5 times faster than the ArrayFire one).

I check the Nvidia profiler with both of them and the main difference I see is memcpy operations, in the case of ArrayFire there are a lot of Memcpy operations, in the other case just a few in the begining of the program. Doing some tests I find out that doing something like :

f = f*q;

being f,q arrays generate more of this memcpy calls... i think this is the reason why my ArrayFire code don't perform better. why this happens? from where, come all this Memcpys? how i can avoid it ? ***** // edit //// a fragment of code

void Adveccion(){
  for(int i = 0; i< q ; i++){
    f(span,span,span,i) = shift( f(span,span,span,i) , V[1][i] , V[0][i] , V[2][i] );
  }
}

f is a four dimensional array. and i have this function inside other for loop. If i modify the function like:

void Adveccion(){
  for(int i = 0; i< q ; i++){
    shift( f(span,span,span,i) , V[1][i] , V[0][i] , V[2][i] );
  }
}

the profiler dont show the massive use of memcpys. Think my problem is find the correct way to assing new values to the arrays... maybe using A = B, is not the best but i'm still have a lot to learn...

Thanks for your attention, in case you could need more code to help me, just let me know. Thanks !

mm no, let me check... but i have other operations like :f(span,span,span,i) = shift( f(span,span,span,i) , V[1][q] , V[0][q] , V[2][q] ); that are more complex — RolandDeschain, Nov 07 '15 at 01:04
Its hard to debug this issue without additional context. ArrayFire performs runtime kernel generation for that operation. The overhead you are seeing is probably not related to what you are doing there. — Umar Arshad, Nov 07 '15 at 15:12
There are some operations which return a variable number of elements. These operations need to perform a malloc to allocate the output buffer to store the results therefor the size needs to be read by the host. — Umar Arshad, Nov 07 '15 at 15:20
@PavanYalamanchili let me think... you think this could be a better approach and will have impact in my the current problem? the use of memcpys ? — RolandDeschain, Nov 07 '15 at 17:16
@RolandDeschain Judging just by this code snippet, I say yes. — Pavan Yalamanchili, Nov 09 '15 at 12:02

ArrayFire Memcpy

0 Answers0