0

I would like to create an array from another array by summing the components in blocks of four, e.g.:

float inVector[256];
float outVector[64];

for(int i=0; i<64; i++){
  for(int j=0; j<4; j++){
    int k = 4*i + j;
    outVector[i] += inVector[k];
  }
}

I would like to accelerate this. I have looked in the available libraries in iOS like vDSP and vForce, but haven't found anything that fits. The closest candidate has been vDSP_vswsum, but that doesn't do what I want. Does anyone have a tip about how to speed this up?

Sten
  • 3,624
  • 1
  • 27
  • 26

2 Answers2

2

My solution was to use vDSP_vadd with a stride:

vDSP_vadd(inVector,4,inVector+1,4,outVector,1,64);
vDSP_vadd(inVector+2,4,outVector,1,outVector,1,64);
vDSP_vadd(inVector+3,4,outVector,1,outVector,1,64);

The solution suggested by user3726960 would look like this

for(int i=0; i<64; i++){
   float out;
   vDSP_sve(inVector+4*i,1,&out,4);
   outVector[i] = out;
}

My solution was about 6 times faster than the original double loop and user3726960's solution was about 3 times faster. But with more elements in the inner loop and fewer in the outer his might be faster.

Sten
  • 3,624
  • 1
  • 27
  • 26
0

You're trying to decimate a vector. vDSP_sve with N=4 will speed up your inner loop. if you eventually want the average of the 4 values, VDSP_mean.

  • Thanks for your input. Yes, that will speed up the loop. I ended up using vDSP_vadd which was even faster in my case (but not necessarily in others). See my answer. – Sten Jun 12 '14 at 06:18