2

Recently, there have been some efforts in GCC community to support OpenACC in their compiler. So, I wanted to try it out.

Using this step-by-step tutorial (tutorial), which was close to the main documentation on GCC website, I was able to compile and build GCC 6.1 with OpenACC support.

Then, I compiled my program using following command:

gcc pi.c -fopenacc -foffload=nvptx-none -foffload="-O3" -O3

And, everything goes without any errors.

The execution is without error, but no correct answer.

Here are my C code and the output of the running program:

#include <stdio.h>
#include <openacc.h>

#define N 20000

#define vl 1024

int main(void) {

  double pi = 0.0f;
  long long i;
  int change = 0;

  printf("Number of devices: %d\n", acc_get_num_devices(acc_device_nvidia));

  #pragma acc parallel 
  {
    change = 1;
    #pragma acc loop reduction(+:pi) private(i)
    for (i=0; i<N; i++) {
      double t= (double)((i+0.5)/N);
      pi +=4.0/(1.0+t*t);
    }
  }

  printf("Change: %d\n", change);

  printf("pi=%11.10f\n",pi/N);


  pi = 0.0;

  for (i=0; i<N; i++) {
    double t= (double)((i+0.5)/N);
    pi +=4.0/(1.0+t*t);
  }

  printf("pi=%11.10f\n",pi/N);

  return 0;

}

And this is the output after running a.out:

Number of devices: 1
Change: 0
pi=0.0000000000
pi=3.1415926538

Any ideas?

mgNobody
  • 738
  • 7
  • 23
  • I have run your code in my Fedora 26 box with GCC 7.1.0 compiled with nvptx-offloading and it works correctly. The output I get is: Number of devices: 1 Change: 0 pi=3.1415926538 pi=3.1415926538 What is your hardware? CUDA version? – Joshua T Sep 18 '17 at 22:37
  • @Calleniah: It's been a year. So bear with me if I am not 100% sure about answers to your questions. CUDA version was 7.5 and my hardware was a Maxwell Geforce GTX 970. – mgNobody Sep 19 '17 at 23:18
  • Can you try printing out the incremental changes in the value of pi in the gpu loop by adding "printf("pi=%11.10f\n",pi);" underneath "pi +=4.0/(1.0+t*t);". What output do you get? My apologies if you've no longer access to the setup used to re-create this error, as you say it's been a year. – Joshua T Sep 20 '17 at 10:08
  • I also think it is worth trying upgrading to GCC 7.1.0 and CUDA 8 and seeing if this problem persists. – Joshua T Sep 20 '17 at 10:14
  • Your code works correctly for me with GCC 7.2 Ubuntu 17.10, gcc-offload-nvptx. `nvprof` also shows that is uses CUDA. – Z boson Mar 09 '18 at 13:36

1 Answers1

2

Try moving "parallel" to the loop instead of the block.

//  #pragma acc parallel
  {
    change = 1;
    #pragma acc parallel loop reduction(+:pi)
    for (i=0; i<N; i++) {
      double t= (double)((i+0.5)/N);
      pi +=4.0/(1.0+t*t);
    }
  }

I just tried this with gcc 6.1 and it worked correctly. Note that there's no need to privatize "i" since scalars are private by default.

Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11
  • Thanks Mat. I did it but my problem still exists! I don't think it is about that. I think I did something wrong in installation process of OpenACC-enabled GCC. However, the steps in the tutorial were straightforward but I don't know why it does not work! – mgNobody Aug 05 '16 at 22:04
  • I'm not sure either since it worked for me. I'll send a note to the folks doing the GNU OpenACC and see if they can help you. – Mat Colgrove Aug 08 '16 at 15:52
  • FYI, I got an out-of-office reply so it may be a few days before he gets to this. – Mat Colgrove Aug 08 '16 at 15:54