No need to linearize the array. Just use multiple brackets.
#pragma acc data copyin(W[0:n_hidden][0:N])
I see number of other issues as well.
The data directive doesn't have a "region" clause. You might be confusing this with an "enter data" clause or the PGI Accelerator model which was the basis for OpenACC.
No need to put "b" in a data clause since it's not actually used in the compute region. Also by putting it in a data clause, you're making "b" a global reference on the device. It's better to leave read-only scalars out of data clauses so that the value is passed in as an argument rather that needing to get it from global memory.
Again by putting the scalar variable "pre_sigmoid_activation" in a data clause, you've created a global variable. Here, the result of the reduction will be stored in this device variable and not automatically updated on the host. For that, you'd need to add an "update" directive. Better yet, just remove it from the data clause and the result of the reduction will be updated to the host variable.
You have an unmatched "exit data" directive (there should be a corresponding "enter data" directive). Also, the directive is placed after the return statement so would never be executed thus leaving the data on the device.
Finally since C++ is case sensitive, make sure the variable names in the OpenACC directives match the actual variable names. i.e. "W" instead of "w".
Here's how I would write the loop. Note that I don't know the size of "W"'s second dimension so just used "N". Please update accordingly.
#pragma acc data copyin(W[0:n_hidden][0:N],h[0:n_hidden])
{
#pragma acc parallel loop reduction(+:pre_sigmoid_activation)
for(int j=0; j<n_hidden; j++) {
pre_sigmoid_activation += W[j][i] * h[j];
}
}