1

I would like the 'r_m[i] /= lines_samples;' line to be executed once, by one thread I mean. Do I have to put a special pragma or do anything for the compiler to understand it?

Here is the code:

    #pragma acc parallel loop
    for(i=0; i<bands; i++)
    {
        #pragma acc loop seq // This may be a reduction, not a seq, who knows? ^^
        for(j=0; j<lines_samples; j++)
            r_m[i] += image_vector[i*lines_samples+j];

        r_m[i] /= lines_samples;

        #pragma acc loop
        for(j=0; j<lines_samples; j++)
            R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];
    }

Thank you a lot!

Jim Cownie
  • 2,409
  • 1
  • 11
  • 20
gamersensual
  • 105
  • 6

1 Answers1

1

Assuming the loops are scheduled with the outer loop being "gang" and inner loop as "vector", this line would be executed once per gang (i.e. only on thread in the gang). So it will work as you expect.

Depending on the trip count of the first "j" loop, you may or may not use a reduction. Reductions do have overhead, so if the trip count is small, then it may be better to leave it as sequential. Otherwise, I suggest using a temp scalar for the reductions since as it is now, would require an array reduction which incurs more overhead.

Something like:

float rmi = r_m[i];
#pragma acc loop reduction(+:rmi)
for(j=0; j<lines_samples; j++)
    rmi += image_vector[i*lines_samples+j];

r_m[i] = rm1/lines_samples;
Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11