-4

I have an array of 100 elements and I want to add all these 100 elements. I'm using the C code for the same as bellow

for(i=0;i<100;i++)
{
sum+= a[i];
}

let us assume processor is taking 100 instructions cycle to add 100 elements which will reduce the speed of the application. So, I would like to know is there any instruction which will do the addition of 100 elements in a single instruction cycle to speed up the application.

Jaga
  • 1
  • 3
  • 4
    there is no way to sum 100 elements in one cycle. You could speed up using SIMD instructions, but depending on the datatype of "a" that would only speed up with a factor 2-4 or something like that; you can't just gain a factor 100! – Chris Maes Apr 16 '14 at 06:41
  • Is your computation CPU bound? Does the data fit in the cache? What else will you do with the data after adding all 100 elements? CPUs are already many times faster than memory, it seems to me if you had a CPU that was capable of doing that you would be memory bandwidth limited... there is no way memory could keep up. It all depends on what you are doing, of course, but that's my hunch. – amdn Apr 16 '14 at 08:15
  • Can you come up with an algorithm on paper that does it? – lorenzog Apr 16 '14 at 08:21
  • I'd love to run my whole program in one instruction cycle. But my search continues... – lurker Apr 16 '14 at 11:14

1 Answers1

3

There is no instruction to add 100 numbers in a single hardware instruction cycle. At least not in any hardware I know of.

But if you are interested in getting the maximum computing performance out of a desktop computer you should look into programming using the graphics card. Top line graphics cards today have over 3000 cores.

Addition isn't perfect for parallel algorithms, because the results are not independent. But if you have more than N cores the time complexity is O(log N).

Suggested internet search terms:

GPU program

GPU programming

Parallel algorithm

Klas Lindbäck
  • 33,105
  • 5
  • 57
  • 82