0

I am doing a very simple vector addition kernel in OpenACC. And I am wondering whether this is an issue with the compiler I am using (accULL with OpenCL), as I am having issue it seems copying data back to the host from the device. All the results are correct BUT result[0]. E.g. the following code:

  for (i=0; i<VEC_SIZE; i++) {
    a[i] = i;
    b[i] = VEC_SIZE-i;
    result[i]=0;
  }
  #pragma acc kernels copyin(a,b) copy(result)
  for (i=0; i<VEC_SIZE; i++) {
    result[i] = a[i]+b[i];
  }

  // verify result
  for (i=0; i<VEC_SIZE; i++) {
    if ( (a[i] + b[i]) != result[i]) {
      fprintf(stderr, "Incorrect results id %d val: %d \n", i, result[i]);
    }
  }

Returns the following:

Incorrect results id 0 val: 0

Which means all results but the one at index 0 is correct, it seems like the result for index zero is not copied over from the device.

Is this a compiler/runtime bug or did I miss something in regards of my coding?

Jacob
  • 3,521
  • 6
  • 26
  • 34
  • the code looks OK to me. I built a complete code and test case around what you have shown here, and compiled it using PGI 14.9 tools, and it seems to work fine. Example is [here](http://pastebin.com/jXKGWVAC) – Robert Crovella Nov 08 '14 at 16:41
  • Brilliant, it looks very much like a bug in accULL for me too. Thanks a lot for confirming. If you want to, you can post and answer I can accept. – Jacob Nov 08 '14 at 16:53
  • Perhaps even better, when you discover what the fix is, come back and answer this question. That will be more useful for future readers. Or perhaps someone else will come along and sort it out for you. Are you using the latest version of accull? It appears to be 0.3 but it seems you can also download from [the master branch](https://bitbucket.org/ruyman/accull/) which may give you "0.3.1" There does also appear to be an [accull support mailing list](https://groups.google.com/forum/#!forum/accull). – Robert Crovella Nov 08 '14 at 17:18

1 Answers1

0

Yes, I also think that is a bug of your compiler, because your code looks right, you can have a try PGI complier, I am using it now, and it belongs to NVIDIA now. Besides, you can change your code "copy(result)" to "copyout(result)" to decrease memory I/O time, because the initial value of result is useless for device.

Shouyu Chen
  • 655
  • 8
  • 16