Questions tagged [openacc]

The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs and accelerators.

The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs and accelerators.

Useful Links

The OpenACC directives and programming model allow programmers to create high-level host+accelerator programs without the need to explicitly initialize the accelerator, manage data or program transfers between the host and accelerator, or initiate accelerator startup and shutdown.

All of these details are implicit in the programming model and are managed by the OpenACC API-enabled compilers and runtimes. The programming model allows the programmer to augment information available to the compilers, including specification of data local to an accelerator, guidance on mapping of loops onto an accelerator, and similar performance-related details.

How to get Useful Answers to your OpenACC Questions on StackOverflow

Here are a number of suggestions to users new to OpenACC and/or StackOverflow. Follow these suggestions before asking your question and you are much more likely to get a satisfactory answer!

  • Search StackOverflow (and the web!) for similar questions before asking yours
  • Include an as-simple-as-possible code example in your question and you are much more likely to get a useful answer. If the code is short and self-contained (so users can test it themselves), that is even better.
403 questions
1
vote
1 answer

Running pgc++ programs on Cluster

I tried to run the below OPenACC program on cluster: #include #include using namespace std; int main() { #pragma acc parallel loop for (int i=0; i<1000; i++) { //cout << i <<…
1
vote
1 answer

How do I translate this ACC code to SYCL?

My question is: I have this code: #pragma acc parallel loop for(i=0; i
gamersensual
  • 105
  • 6
1
vote
1 answer

How do I indicate OpenACC to sequentially execute one instruction inside a parallel loop?

I would like the 'r_m[i] /= lines_samples;' line to be executed once, by one thread I mean. Do I have to put a special pragma or do anything for the compiler to understand it? Here is the code: #pragma acc parallel loop for(i=0; i
gamersensual
  • 105
  • 6
1
vote
2 answers

Are macros (always) compatible and portable with OpenACC?

In my code I define the lower and upper bounds of different computational regions by using a structure, typedef struct RBox_{ int ibeg; int iend; int jbeg; int jend; int kbeg; int kend; } RBox; I have then introduced the following…
Steve
  • 89
  • 1
  • 6
1
vote
1 answer

How does vector_length and num_workers work in an OpenACC routine?

When using an OpenACC "#pragma acc routine worker"-routine, that contains multiple loops of vector (and worker) level parallelism, how do vector_length and num_workers work? I played around with some code (see below) and stumbled upon a few…
Dunkelkoon
  • 398
  • 2
  • 10
1
vote
2 answers

Fortran OpenACC invoking a function on device using a function pointer

How can I access a function on device via a function pointer? In below code I am trying to access init0 or init1 using function pointer init. The code does work as intended if OpenACC is not enabled during compilation. However, it fails when…
DKS
  • 188
  • 10
1
vote
0 answers

Compilation error while compiling OpenACC+MPI Fortran code with mpif90

As suggested in the answer from the following post, I am trying to build my Fortran program using mpif90 with -acc=gpu flag. Getting started with OpenACC + MPI Fortran program I was shown the following error: gfortran: error: unrecognized command…
1
vote
1 answer

ERROR while compiling this OpenACC code ? Can anyone figure out?

This is the code . CALL OMP_SET_NUM_THREADS(2) !$omp parallel num_threads(acc_get_num_devices(acc_device_nvidia)) !$omp sections !$omp section !$acc data copyout(T) copyin(T_o) call acc_set_device_num(1, acc_device_nvidia ) !$acc kernels do…
1
vote
1 answer

OpenACC duplicate array on device

On a Fortran program accelerated with OpenACC, I need to duplicate an array on GPU. The duplicated array will only be used on GPU and will never be copied on host. The only way I know to create it would be to declare and allocate it on host, then…
Neraste
  • 485
  • 4
  • 15
1
vote
1 answer

OpenACC | Fortran 90: What is the best way to parallelize nested DO loop?

I am trying to parallelize the following nested DO loop structure (the first code below) using 'Collapse' directive in OpenACC. The variable 'nbl' present in the outermost loop is present in the other DO loops, so there is dependency. Thanks to the…
1
vote
1 answer

Is there a faster argmin/argmax implementation in OpenACC?

Is there a faster alternative for computing the argmin in OpenACC, than splitting the work in a minimum-reduction loop and another loop to actually find the index of the minimum? This looks very wasteful: float minVal =…
Dunkelkoon
  • 398
  • 2
  • 10
1
vote
0 answers

Nsys Profile with MPMD(multiple program and multiple data) simulation

I am trying to profile a MPI+OPENACC program with nsys. I am using OpenMPI(3.1.6) from Nvidia HPC SDK(20.7) with UCX enabled. There are three exectuables, exec1, exec2, exec3. I want to profile for exec3. But I am failing. Following is the run…
HEMANT GIRI
  • 31
  • 1
  • 7
1
vote
1 answer

Some questions about acc routine

One MPI code, I am trying to parallelize a simple loop of it with openacc,and the output is not expected. Here, the loop has a call and I add a 'acc routine seq' in the subroutine. If I manually make this call inline and delete the subroutine, the…
Xin Ding
  • 13
  • 2
1
vote
1 answer

OpenACC reduction clause with max()

I am learning OpenACC and came across the code bellow for the Jacobi iteration, provided by NVidia. From my understanding, reduction(max:err) creates a private err variable for each loop iteration, and returns the max value from all of them. My…
1
vote
1 answer

OpenACC: Deep Copy and Unified Memory

I would like to understand clearly a situations I faced often accelerating an application with OpenACC. Let's say I have this loop: #pragma acc parallel loop collapse(4) for (k = KBEG; k <= KEND; k++){ for (j = JBEG; j <= JEND; j++){ for (i = IBEG;…
Steve
  • 89
  • 1
  • 6