I have a very simple program that I am trying to improve performance. One way that I know will help is to utilize SSE3 (since the machine that I am working supports this), but I have absolutely no idea how to to do this. Here is a code snippet (c++):
int sum1, sum2, sum3, sum4;
for (int i=0; i<length; i+=4) {
for (int j=0; j<length; j+=4) {
sum1 = sum1 + input->value[i][j];
sum2 = sum2 + input->value[i+1][j+1];
sum3 = sum3 + input->value[i+2][j+3];
sum4 = sum4 + input->value[i+3][j+4];
{
}
I've read a little about this, and understand the idea, but I have absolutely no idea how to implement this. Can somebody help me please? I think that this is fairly simple, particularly for my simple program, but sometimes getting started is the hardest part.
Thanks!