I have a code that is written in OpenMP originally. Now, I want to migrate it into OpenACC. Consider following:
1- First of all, OpenMP's output result is considered as final result and OpenACC output should follow them.
2- Secondly, there are 2 functions in the code that are enabled by input to the program on terminal. Therefore, either F1
or F2
runs based on the input flag.
So, as mentioned before, I transferred my code to OpenACC. Now, I can compile my OpenACC code with both -ta=multicore
and -ta=nvidia
to compile OpenACC regions for different architectures.
For F1
, the output of both of the architectures are the same as OpenMP. So, it means that when I compile my program with -ta=multicore
and -ta=nvidia
, I get correct output results similar to OpenMP when F1
is selected.
For F2
, it is a little bit different. Compiling with -ta=multicore
gives me a correct output as the OpenMP, but the same thing does not happen for nvidia architecture. When I compile my code with -ta=nvidia
the results are wrong.
Any ideas what might be wrong with F2
or even build process
?
Note: I am using PGI compiler 16 and my NVIDIA GPU has a CC equal to 5.2.