parallelizing dynamic arrays

Question

this is part of back-propagation algorithm code on neural networks.

in our case we want to parallelize the for( pt=0; pt< N_PT_pair; pt++) loop, the for(epoch=0; epoch< MaxEpoch; epoch++) can not be parallelized.

initialize W1[ ] [ ] and W2[ ][ ] with random values
for(epoch=0; epoch<MaxEpoch; epoch++)
     dW1[ ][ ]=0.0; dW2[ ][ ]=0.0; //sum of weight corrections
     sse = 0; // Sum of square of errors
     for( pt=0; pt<N_PT_pair; pt++)
          input = pattern[pt];
          compute output  // procedure as above
          compare target[pt] and output and
          compute dW2[ ][ ] += ... // procedure to be described
          compute dW1[ ][ ] += ... // procedure to be described
          for(k=1; k<=Noutput; k++) 
              sse+=pow((target[pt][k]-output[k]),2);
     end pt for loop   
     cout << "mean square error" << sse/N_PT_pair;
     W1[ ][ ] += rate*dW1[ ][ ]
     W2[ ][ ] += rate*dW2[ ][ ]
end epoch for loop

these are the codes for allocating and deallocating arrays

double** allocate_matrix(int rows,int cols) 
{    
    double **a;

    a = new double*[rows];    
    if(a==NULL){cout<<"matrix allocation failed"<<endl;exit(-1);}

    for (int j=0;j<rows;j++){   
       a[j] =  new double[cols];
       if(a[j]==NULL) {cout<<"matrix allocation failed"<<endl;exit(-1);}    
    }

    return a;
}

int deallocate_matrix(double**a,int rows)    
{
    for(int i=0;i<rows;i++)
       delete [] a[i];
    delete [ ] a;    
    return 0;
}

can you help us parallelize the code?

`new` doesn't return `NULL`. It throws `std::bad_alloc`. More importabtly, what parallel processing library are you using? — MSalters, May 29 '12 at 10:59

score 1 · Answer 1 · answered May 29 '12 at 12:28

If the iterations in the internal loop are independent of one another then you could simply start with one OpenMP construct:

#pragma omp parallel for private(input,k) reduction(+:sse)
for( pt=0; pt<N_PT_pair; pt++)
     input = pattern[pt];
     compute output  // procedure as above
     compare target[pt] and output and
     compute dW2[ ][ ] += ... // procedure to be described
     compute dW1[ ][ ] += ... // procedure to be described
     for(k=1; k<=Noutput; k++) 
         sse+=pow((target[pt][k]-output[k]),2);
end pt for loop

This would work great if no element of dW1 or dW2 is updated in more than one iteration, otherwise atomic access would be required and that will kill the performance (OpenMP still does not support reduction on arrays in C/C++).

If you have a large number of network weights, you could also parallelise the multiplication in the same manner.

The parallel region can be further moved outside the outer loop in order to reduce the OpenMP overhead and single or master OpenMP directives can be used to isolate the code that should only run in a single thread.

In order for the compiler to understand the #pragma omp directive you have to enable OpenMP support. How exactly is this done is compiler specific:

-fopenmp for GCC
-openmp for Intel C/C++ Compiler
-xopenmp for Oracle Solaris Studio
Project Properties -> etc. for MS Visual Studio

parallelizing dynamic arrays

1 Answers1