1

I am trying to parallelize the function calls within the for-loop inside main of test_function with OpenMP/CilkPlus (as shown in the C code). For each iteration, read/write operations occur on only one row of the 2d_array so there are no data dependencies within the iterations (2d_array is shared among the available threads, and i is private by default).

void test_function(int *in, int len)
{
    int i, value; 
    int *x, *low, *high;
    x = x_alloc + 4;
    for (i=0; i<len; i++)
        x[i] = in[i];

    for(i=1;i<=4;i++) 
    {
        x[-i] = x[i];
        x[(len-1) + i] = x[(len-1) - i];
    }

    high = malloc(sizeof(int)*((len>>1) + 1));

    for(i=0; i < ((len>>1) + 1); i++)
    {
        value = x[-4 + 2*i] + x[2 + 2*i] + x[-2 + 2*i] + x[2*i];
        high[i] = x[-1 + 2*i] - value;
    }

    low = malloc(sizeof(int)*(len>>1));

    for(i = 0; i < (len>>1); i++) 
    {
        value = high[i] + high[i + 1];
        low[i] = x[2*i] - value;
    }
    for (i=0; i<(len>>1); i++)
        in[i] = low[i];
        in[i+(len>>1)] = high[i+1];

    free(low);
    free(high);
}

int main{...}
...
int **2d_array;

...
#pragma omp parallel for
for(i = 0; i < height; i++){
    test_function(2d_array[i], width);
}

Regardless, the result is wrong. Also tried cilk_for instead of OpenMP pragmas. Is there a specific way to treat 2D arrays when each row is altered during each iteration?

koukouviou
  • 820
  • 13
  • 23
  • What did you try? Did you try `#pragma omp parallel for private(value)`? – Z boson Sep 30 '14 at 07:36
  • yes, if you scroll down in the code I have the #pragma omp parallel for directive. I also edited the question - I removed the non-thread-safe memcpy but that was not the problem, I am still getting wrong results – koukouviou Sep 30 '14 at 08:25
  • I did not see the last part of your code. But where do you allocate space for `x`? – Z boson Sep 30 '14 at 08:38
  • Re-edited, this is a stripped-out version of the code which otherwise would be too big to post here, sorry. I am trying to figure out whether the allocated memory for high and low will be "generated" at each of the iterations independently or if they could be the culprits. – koukouviou Sep 30 '14 at 08:49
  • 1
    allocate `x` inside your function otherwise each thread writes to the same `x` array. – Z boson Sep 30 '14 at 08:51

2 Answers2

1

The problems is that the variable x points to the same memory address for each thread. To fix this do something like

x = malloc(sizeof(int)*(len+8));

inside test_function.

This is a common source of error using pointers with OpenMP. If you do &x you will see that each thread has a different memory address for x. So the pointers are actually private however when you do x = x_alloc + 4; the memory address each pointer points to is the same. This is effectively like having a shared array but you really want a private array for each thread. Therefore you have to allocate space for thread and set each private pointer to that private space.

If you want to avoid allocating and deallocating x for each iteration you can try something like this

#pragma omp parallel
{
    int *x;
    x = malloc(sizeof(int)*(len+8));
    #pragma omp for
    for(i = 0; i < height; i++){
        test_function(2d_array[i], width, x);
    }
    free(x);
}

where test_funciton now requires x as an argument.

Z boson
  • 32,619
  • 11
  • 123
  • 226
1

Be aware that calls to malloc may be thread-safe but may lock the heap, creating contention and limiting speedup.

  • Barry
  • This was an important factor in further optimizations. To remove the limitation of heap-locking, the arrays were created on the stack if their size was small enough for the stack to accomodate them. Along with several other optimizations I achieved almost linear speedup. It is good to mention here that CilkPlus and in particular cilk_for was by far better for this workload when compared to OpenMP – koukouviou Oct 02 '14 at 00:16