I can't speak directly about using GSL within OpenACC data and compute regions but can give you general answer about aggregate types with dynamic data members.
The first thing to try, assuming you're using PGI compilers and a newer NVIDIA device, is CUDA Unified Memory (UVM). Compiling with the flag "-ta=tesla:managed", all dynamically allocated data will be managed by the CUDA runtime so you don't need to manage the data movement yourself. There is overhead involved and caveats but it makes things easier to get started. Note that CUDA 8.0, which ships with PGI 16.9 or later, improves UVM performance.
Without UVM, you need to perform a manual deep copy of the data. Below is the basic idea where you first create the parent structure on the device and perform an shallow copy. Next create the dynamic array, "data" on the device, copy over the initial values to the array, then attach the device pointer for data to the device structure's data pointer. Since "block" is itself an array of structs with dynamic data members, you'll need to loop through the array, creating it's data arrays on the device.
matrix * mat = (matrix*) malloc(sizeof(matrix));
#pragma acc enter data copyin(mat)
// Change this to the correct size of "data" and blocks
#pragma acc enter data copyin(mat.data[0:dataSize]);
#pragma acc enter data copyin(mat.block[0:blockSize]);
for (i=0; i < blockSize; ++i) {
#pragma acc enter data copyin(mat.block[i].data[0:mat.block[i].size])
}
To Delete, walk the structure again, deleting from the bottom-up
for (i=0; i < blockSize; ++i) {
#pragma acc exit data delete(mat.block[i].data)
}
#pragma acc exit data delete(mat.block);
#pragma acc exit data delete(mat.data);
#pragma acc exit data delete(mat);
When you update, be sure to only update scalars or arrays of fundamental data types. i.e., update "data" but not "block". Update does a shallow copy so updating "block" would update host or device pointers leading to illegal addresses.
Finally, be sure to put the matrix variable in a "present" clause when using it in a compute region.
#pragma acc parallel loop present(mat)