0

I am a beginner in SYCl/DPC++. I want to print multiples of 10 but, instead of that, I am getting 0's in place of that.

I am using the USM (Unified Shared Memory) and I am checking the data movement in the shared memory and host memory implicitly. So I have created two Arrays and I have initialized and performing the operation on them. I can see the same results for both of them.

Here is my code; I don't understand where I went wrong.

#include <CL/sycl.hpp>
#include<iostream>
using namespace std;
using namespace sycl;
constexpr int n = 10;

int main() {
  queue q;
  int *hostArray = malloc_host<int>(n, q);
  int *sharedArray = malloc_shared<int>(n, q);

  for (int i = 0; i < n; i++)
    hostArray[i] = i;
  q.submit([&](handler &h) {
      h.parallel_for(n, [=](id<1> i) {
          sharedArray[i] = hostArray[i] * 10;
        });
    });

  for (int i = 0; i <  n; i++) {
    hostArray[i] = sharedArray[i];
    cout<<hostArray[i]<<" "<<sharedArray[i];
    cout<<"\n";
  }
  cout<<"\n";
  return 0;
}

Expected Results:

                  0   0
                  10 10
                  20 20
                  30 30
                  40 40
                  50 50
                  60 60
                  70 70
                  80 80
                  90 90

Actual Output:

                  0 0
                  0 0
                  0 0
                  0 0
                  0 0
                  0 0
                  0 0
                  0 0
                  0 0
                  0 0
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
sv6
  • 11
  • 4
  • Are you using 1-based indices there? – Ulrich Eckhardt Sep 08 '21 at 09:57
  • yes, I am using 1-based indices in order to get the multiples of 10 starting from 10 instead of 0 – sv6 Sep 08 '21 at 10:20
  • Okay, but you know that C++ is generally using zero-based indices, right? That then looks like an out-of-bounds array access, except for the parallel_for, which again uses correct indices. – Ulrich Eckhardt Sep 08 '21 at 10:34
  • I have changed and checked the result is same as before – sv6 Sep 08 '21 at 11:36
  • Please [edit] your question and add both expected and actual results. In any case, if the docs don't explicitly mard these arrays as one-based, the above code is wrong and anything could happen when you run it. – Ulrich Eckhardt Sep 08 '21 at 12:20
  • I have added the expected and actual results and please do check my code if I am wrong – sv6 Sep 08 '21 at 13:08

1 Answers1

2

You are missing a barrier between the submission of the queue and the for loop in the host code.

Although it is true that an USM shared memory allocation is visible on the host and the device, there is no guarantees that the command group you have submitted to the queue will execute before the for loop in the host: Submissions to queues execute asynchronously w.r.t to the calling thread. Updated code below:

    #include <CL/sycl.hpp>
    #include<iostream>
    using namespace std;
    using namespace sycl;
    constexpr int n = 10;
    
    int main() {
      queue q;
      int *hostArray = malloc_host<int>(n, q);
      int *sharedArray = malloc_shared<int>(n, q);
    
      for (int i = 0; i < n; i++)
        hostArray[i] = i;
      q.submit([&](handler &h) {
          h.parallel_for(n, [=](id<1> i) {
              sharedArray[i] = hostArray[i] * 10;
            });
        });
      // Wait for completion of all previously submitted command groups
      q.wait();
    
      for (int i = 0; i <  n; i++) {
        hostArray[i] = sharedArray[i];
        cout<<hostArray[i]<<" "<<sharedArray[i];
        cout<<"\n";
      }
      cout<<"\n";
      return 0;
    }
Ruyk
  • 775
  • 5
  • 11