1

How to exchange 2 dimensional element between host CPU and GPU? I tried to use this 2d element as w[0:(n_hidden*i)-1], but the complier feedback tells me there is some thing wrong

This is RBM algorithm function:

double RBM::propdown(int *h, int i, double b) {
#pragma acc data region \
copyin(w[0:(n_hidden*i)-1],h[0:n_hidden],b) create(pre_sigmoid_activation)
double pre_sigmoid_activation = 0.0;
#pragma acc parallel loop reduction(+:pre_sigmoid_activation) 
for(int j=0; j<n_hidden; j++) {
pre_sigmoid_activation += W[j][i] * h[j];
}
pre_sigmoid_activation += b;  
return sigmoid(pre_sigmoid_activation);
#pragma acc exit data \
delete ( pre_sigmoid_activation)
}

1 Answers1

2

No need to linearize the array. Just use multiple brackets.

#pragma acc data copyin(W[0:n_hidden][0:N])

I see number of other issues as well.

The data directive doesn't have a "region" clause. You might be confusing this with an "enter data" clause or the PGI Accelerator model which was the basis for OpenACC.

No need to put "b" in a data clause since it's not actually used in the compute region. Also by putting it in a data clause, you're making "b" a global reference on the device. It's better to leave read-only scalars out of data clauses so that the value is passed in as an argument rather that needing to get it from global memory.

Again by putting the scalar variable "pre_sigmoid_activation" in a data clause, you've created a global variable. Here, the result of the reduction will be stored in this device variable and not automatically updated on the host. For that, you'd need to add an "update" directive. Better yet, just remove it from the data clause and the result of the reduction will be updated to the host variable.

You have an unmatched "exit data" directive (there should be a corresponding "enter data" directive). Also, the directive is placed after the return statement so would never be executed thus leaving the data on the device.

Finally since C++ is case sensitive, make sure the variable names in the OpenACC directives match the actual variable names. i.e. "W" instead of "w".

Here's how I would write the loop. Note that I don't know the size of "W"'s second dimension so just used "N". Please update accordingly.

#pragma acc data copyin(W[0:n_hidden][0:N],h[0:n_hidden])
{
  #pragma acc parallel loop reduction(+:pre_sigmoid_activation)
  for(int j=0; j<n_hidden; j++) {
    pre_sigmoid_activation += W[j][i] * h[j];
  }
}
Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11
  • Thank you a lot I don't know the reason but when I put parallel as: #pragma acc parallel loop reduction(+:pre_sigmoid_activation) the compiler feed back is : " call to cuMemcpyDtoHAsync returned error 700: Illegal address during kernel execution call to cuMemFreeHost returned error 700: Illegal address during kernel execution " – Alwaleed A. Hamam Nov 30 '16 at 11:52
  • The means that you're either accessing an array out of bounds, or more likely there's a host pointer. Is "W" a class data member? If so, there's a hidden reference to the classes "this" pointer so accessing "W[j][I]" is really "this->W[j][I]". To fix, add "this" to your "copyin" clause, i.e. "copyin(this, W[..." If that's not the problem, could you post a reproducing example? – Mat Colgrove Nov 30 '16 at 19:48
  • Note if you needs some examples on using OpenACC with C++ Classes as well as multi-dimensional arrays, you can grab the examples I wrote for Chapter 5 of "Parallel Programing with OpenACC" on github https://github.com/rmfarber/ParallelProgrammingWithOpenACC/ – Mat Colgrove Nov 30 '16 at 19:49
  • Please, could u see this topic: http://stackoverflow.com/questions/40976149/rbm-no-improvement-with-openacc-on-the-code-yet – Alwaleed A. Hamam Dec 05 '16 at 14:09