Questions tagged [openacc]

The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs and accelerators.

The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs and accelerators.

Useful Links

The OpenACC directives and programming model allow programmers to create high-level host+accelerator programs without the need to explicitly initialize the accelerator, manage data or program transfers between the host and accelerator, or initiate accelerator startup and shutdown.

All of these details are implicit in the programming model and are managed by the OpenACC API-enabled compilers and runtimes. The programming model allows the programmer to augment information available to the compilers, including specification of data local to an accelerator, guidance on mapping of loops onto an accelerator, and similar performance-related details.

How to get Useful Answers to your OpenACC Questions on StackOverflow

Here are a number of suggestions to users new to OpenACC and/or StackOverflow. Follow these suggestions before asking your question and you are much more likely to get a satisfactory answer!

  • Search StackOverflow (and the web!) for similar questions before asking yours
  • Include an as-simple-as-possible code example in your question and you are much more likely to get a useful answer. If the code is short and self-contained (so users can test it themselves), that is even better.
403 questions
2
votes
1 answer

NVIDIA GPU support branch prediction? (with OpenACC)

I'm using NVIDIA GPU with OpenACC (NVIDIA GeForce960, compiler:PGI 15.7) Does NVIDIA GPU support branch prediction? My code has conditional execution code in long loop. But when I run my code on GPU, It takes so long time. Below is Example code…
soongk
  • 259
  • 3
  • 17
2
votes
2 answers

Using OpenACC to parallelize nested loops

I am very new to openacc and have just high-level knowledge so any help and explanation of what I am doing wrong would be appreciated. I am trying to accelerate(parallelize) a not so straightforward nested loop that updates a flattened (3D to 1D)…
anupshrestha
  • 236
  • 5
  • 19
2
votes
2 answers

Strong scaling on GPUs

I'd like to investigate the strong scaling of my parallel GPU code (written with OpenACC). The concept of strong scaling with GPUs is - at least as far as I know - more murky than with CPUs. The only resource I found regarding strong scaling on GPUs…
lodhb
  • 929
  • 2
  • 12
  • 29
2
votes
0 answers

Thrust data transfers between host and device?

Here is the code which reproduces the unexplained behavior: main.cpp #include #include extern "C" int findme(float *ARRAY); int main(){ float *ARRAY = new float [10]; int position; ARRAY[0] = 97.7302; …
lodhb
  • 929
  • 2
  • 12
  • 29
2
votes
1 answer

Reshaping A Dynamic Array Using Function Parameters

Today I found this is in an example file given to me by a company: void mySgemm( int m, int n, int k, float alpha, float beta, float a[m][n], float b[n][k], float c[m][k], int accelerate ) Called with: a_cpu = malloc(..); b_cpu =…
Constantin
  • 16,812
  • 9
  • 34
  • 52
2
votes
1 answer

Host data should be allocated for create and pcreate clauses?

I am currently studying the openacc API, and I was wondering if it is possible to create an array on the device without having any corresponding allocate array on the Host. Let's say that I want to use my old cuda kernel, and only handle memory…
chabachull
  • 70
  • 5
2
votes
1 answer

Multi-dimensional array copy OpenACC

I have a 2D matrix SIZE x SIZE, which I'm trying to copy to the GPU. I allocate the matrix this way: #define SIZE 1024 float (*a)(SIZE) = (float(*)[SIZE]) malloc(SIZE * SIZE * sizeof(float)); And I have this on my ACC region: void…
leo
  • 1,117
  • 1
  • 8
  • 18
1
vote
0 answers

libquadmath.o.dylib found by gcc, but not mpicc

I want to compile some igraph code within a file that uses MPI and OpenACC. Using an igraph example (“sparsemat2.c”), it compiles with “gcc”, but not “mpicc”. $ gcc sparsemat2.c -I/usr/local/include/igraph -o sparsemat2 -ligraph $ mpicc sparsemat2.c…
Mark Bower
  • 569
  • 2
  • 16
1
vote
0 answers

Issue with Writing Array Elements to File in OpenACC

Hello OpenACC experts, I'm facing a problem with writing array elements to a file using OpenACC. Here's the relevant code snippet: #include #include using namespace std; int main() { ofstream THeOutfile; …
1
vote
1 answer

Unable to access CUDA device with OpenACC on WSL2 Ubuntu: Error code=34

I am new to using OpenACC on WSL2 with Ubuntu and have encountered an issue. I successfully installed the HPC SDK as instructed on the website, without installing CUDA separately, as the latest CUDA version was included with the HPC SDK. However,…
1
vote
0 answers

Use std::vector with OpenACC

I’m trying to compute on GPU, using OpenACC, the sum between two vectors of std::vector. As compiler I’m using GCC+NVPTX with OpenACC support but when I compile the code with these flags: g++ -fopenacc -offload=nvptx-none -fopt-info-optimized-omp -g…
1
vote
1 answer

OpenACC: Why updating an array depends on the location of the update directive

I'm new to openacc. I'm trying to use it to accelerate a particle code. However, I don't understand why when updating an array (eta in the program below) on the host, it gives different results depending on the location of '!$acc update self'. Here…
FeyPhys
  • 25
  • 4
1
vote
1 answer

Compiling with PGI PGCC with LAPACK and LBLAS libraries?

I'm trying to compile my OpenACC parallel C++ program that makes use of dgemm (BLAS) and dgesvd (LAPACK) functions. I'm trying to compile the program with PGI PGCC compiler, linking it with the libraries like this (the program is called "VD"): #…
gamersensual
  • 105
  • 6
1
vote
1 answer

How to apply cuda-memcheck to an app with piped inputs from standard I/O

I want to use cuda-memcheck for an app with standard I/O. The app, dut, reads standard Input and writes standard Output. cat input.txt | cuda-memcheck ./dut -dutoptions > output.txt In this case, the dut app seems to be launched, but cuda-memcheck…
hakunom
  • 13
  • 2
1
vote
1 answer

How do I translate this simple OpenACC code to SYCL?

I have this code: #pragma acc kernels #pragma acc loop seq for(i=0; i
gamersensual
  • 105
  • 6