I have a struct defined, in which I have a dynamically allocated array, and I need to transfer this struct from the host to the accelerator (in my case it would be some nvidia GPU) through some OpenMP directives (in a C-code).
The struct
looks as follows:
struct my_grid {
double **a,
**b,
**c;
};
It is initialized as follows:
initialize_my_grid( struct my_grid *my_gd,
struct grid_config grid_cfg ) {
my_gd->a = matrix_double( grid_cfg.N1.value, grid_cfg.N2.value );
my_gd->b = matrix_double( grid_cfg.N1.value, grid_cfg.N2.value );
my_gd->c = matrix_double( grid_cfg.N1.value, grid_cfg.N2.value );
}
The additional struct
grid_config
only contains some scalars, the relevant part looks as follows:
struct grid_config {
struct my_variable_size_t
N1, N2;
}
struct my_variable_size_t {
size_t value;
The memory allocation function is as follows (omitting the NULL-error check for easier readability):
double **matrix_double( size_t n_rows, size_t n_cols ) {
size_t ii;
double *arr;
// (c)allocate pointers to rows
arr = calloc( (size_t)n_rows, sizeof(double*) );
// (c)allocate rows and set pointer to cols (?)
arr[0] = calloc( (size_t)(n_rows*n_cols), sizeof(double) );
for ( ii=1 ; ii<n_rows ; ++ii )
arr[ii] = arr[ii-1] + n_cols;
}
In the relevant loop which I'd like to run on the accelerator (the GPU), I thought I could do something like
#pragma omp target teams distribute private(jj,kk)
for ( jj=2 ; jj<grid_cfg.N1.value-2 ; jj+=2 ) {
for ( kk=2 ; kk<grid_cfg.N2.value-2 ; kk+=2 ) {
wf_gd->a[jj ][kk ] += wf_gd->b[jj+1][kk ]; // plus additional calculations
}
}
But that does not work as I get compiler errors like Unsupported OpenACC construct Deep copy -- wf_gd
when compiling with cc
.
First of all, I don't understand why the compiler is mentioning OpenACC and then I also like to know if it is possible to use these kind of data structure or if I need to get rid of it for this case?
EDIT: I tried to compile it on a CRAY machine which is important for the compiler message but should not play a role for the general question (how to pass such a construction to the accelerator - if it is possible at all). Unfortunately I can access that machine now and therefore not give further information about the version of the compiler I used - but again, that should not be related to the general question (at least in my understanding).