-1

Can you please suggest how can I make openacc more parallel. I am making mergesort with insertion sort. Should I use "loop" or "for" for using loop. Also for insertion sort should it be kernel or parallel.

#include <stdlib.h>
#include<stdio.h>
#include <time.h>
#include <openacc.h>
#define THR 1000

//Insertion sort
void isort (int *a, int left, int mid, int right) {

int i,j;
# pragma acc kernels
{
# pragma acc parallel loop num_gangs (1024)
for ( i = mid; i <= right; i++) {
    for ( j = i - 1; j >= 0; j--) {
        if (a[i] < a [j]) {
            int temp = a[j];
            a[j] = a[i];
            a[i] = temp;
            i--;
        }
    }
}
}
}
void merge(int a[], int left, int right,int left_half[], int right_half[])
{
int i, j, k;
int mid = (left + right + 1) / 2;

i = j = 0;
k = left;

while (i < mid - left && j <= right - mid) {
    if (left_half[i] < right_half[j]) {
        a[k] = left_half[i];
        ++i;
    } else {
        a[k] = right_half[j];
        ++j;
    }

    ++k;
   }

  // Copying any leftover elements
  #pragma acc data copy(a, right_half)
  while (j <= right - mid) {
        a[k++] = right_half[j++];//copy remaining elements of the first half

    }
   #pragma acc data copy(a, left_half)
   while (i < mid - left) {
        a[k++] = left_half[i++]; //copy remaining elements of the second list
    }
   }

  void mergeSort(int a[], int left, int right)
{
if (left < right) {
    int mid = (left + right + 1) / 2;
    int left_half[mid - left];
    int right_half[right - mid + 1];
    int i;
   # pragma acc kernels
   {
    // Copying elements
    # pragma acc parallel loop shared (left_half, a)
    for (i = left; i < mid; ++i) {
        left_half[i - left] = a[i];
    }

    // Copying elements
    # pragma acc parallel loop shared (right_half, a)
    for (i = mid; i <= right; ++i) {
        right_half[i - mid] = a[i];
    }
  }
    // Recursive call
    mergeSort(left_half, 0, mid - left - 1);
    mergeSort(right_half, 0, right - mid);
    // Merge the two partitions
    if ((right - left) > THR){
        merge(a, left, right, left_half, right_half);
    } else {
        isort(a, left,mid, right);
    }
}
}


  int main()
   {
int i, n, *a,c;

printf("Enter the number of elements\n");
scanf("%d",&n);      
a = (int *)acc_malloc(sizeof(int) * n);  
srand(time(0));
for(i=0;i<n;i++){
   a[i]=rand()%1000;
}
printf("\nThe unsorted a is:");
printf("\n");
for(i=0;i<n;i++)
    printf("%d  ",a[i]);;

    mergeSort(a, 0, n-1);
printf("\nSorted a:");
printf("\n");
for(i=0;i<n;i++)
    printf("%d  ",a[i]);
printf("\n");
 }  
Naeem Ul Wahhab
  • 2,465
  • 4
  • 32
  • 59

1 Answers1

0

I don't know the syntax of openacc. As of openmp syntax, if you have larger arrays to loop you can even run each loop of the for loop in parallel while both the for loops run parallel. Take a look at this link1, link2. I don't know if you meant the same by writing # pragma acc parallel loop above for loops or if you have something like this in openacc you can add that.

And you can run both mergesorts parallel, something like this.

# pragma acc kernels
   {
     # pragma acc parallel{mergeSort(left_half, 0, mid - left - 1);}
     # pragma acc parallel{mergeSort(right_half, 0, right - mid);}
   }
Community
  • 1
  • 1
Aerron
  • 310
  • 1
  • 12
  • Thanks a lot for your suggestion, I will try to add both the things your have mentioned, and hopefully it will make code more efficient. :) – pragya sharma Apr 16 '17 at 11:16
  • So does this mean the for loop runs parellel `# pragma acc parallel loop shared ` as [described](http://stackoverflow.com/a/36986895/5419015) in here – Aerron Apr 16 '17 at 11:17
  • Yes it improved parallel execution time. – pragya sharma Apr 16 '17 at 12:23
  • I was asking about the for loop. Add a parallel for only if the array is large enough else don't add. – Aerron Apr 16 '17 at 12:24
  • Actually I am using large array size so I needed to add parallel for. – pragya sharma Apr 16 '17 at 18:24
  • Well then, the main thing in merge sort is adding parallel for two mergesorts and then if the iterations are large enough we can add parallel for. In Parallel for, the initial division of work, setting up worker threads, and invoking a function/operation for each iteration, all incur a certain cost which might yield performance degradation instead of an improvement. – Aerron Apr 16 '17 at 18:27
  • So it means we should not use parallel for in case of small number of elements in array?It can decrease the performance?By the way thanks you really explained it well. :) – pragya sharma Apr 16 '17 at 20:20
  • Yes, use only when each iteration takes a lot of time as it might have a lot of functors and takes times or when we had a lot of loops. Well, I am a starter too it's just that I have done a small sudoku solver project on parallel programming recently using OpenMP. – Aerron Apr 17 '17 at 03:12
  • thanks a lot you really helped a lot. – pragya sharma Apr 17 '17 at 09:45
  • Lol why are you thanking me a lot✌️ It's fine buddy. – Aerron Apr 17 '17 at 09:46
  • Actually I am also new for openacc and I have to submit reports for my university project so I logged in to this website and my first experience maybe you have already read in previous comments. So I have to thank you for responding well and helping me. :) – pragya sharma Apr 17 '17 at 10:13
  • So do you mean writing a parallel merge sort is a project for you? – Aerron Apr 17 '17 at 10:14
  • Yes it is a project and we have to give reports on OpenAcc and Cuda. – pragya sharma Apr 17 '17 at 10:23
  • Fine it would be good if you select something non-trivial like some multiplayer concurrent game or parallel downloader. Anyways :) carryon. – Aerron Apr 17 '17 at 10:25
  • Yes right, but it's okay.Thanks. :) – pragya sharma Apr 17 '17 at 10:47
  • Lol I just started answering in here and bhoom a lot of stack overflow notifications of comments and suggestions. Yaaay thanks for supporting:) – Aerron Apr 17 '17 at 10:49