Reducing the max value and saving its index

Question

int v[10] = {2,9,1,3,5,7,1,2,0,0};
int maximo = 0;
int b = 0;
int i;

#pragma omp parallel for shared(v) private(i) reduction(max:maximo)
for(i = 0; i< 10; i++){
    if (v[i] > maximo)
        maximo = v[i];
    b = i + 100;
}

How can I get the value that b gets during the iteration when maximo gets its max value (and therefore, its value after the for loop)?

dreamcrash · Answer 1 · 2021-03-17T19:21:29.757

TL;DR You can use User-Defined Reduction.

First, instead of:

for(i = 0; i< 10; i++){
    if (v[i] > maximo)
        maximo = v[i];
    b = i + 100;
}

you meant this:

for(i = 0; i< 10; i++){
    if (v[i] > maximo){
        maximo = v[i];
        b = i + 100;
    }
}

OpenMP has in-build reduction functions that consider a single target value, however in your case you want to reduce taking into account 2 values the max and the array index. After OpenMP 4.0 one can create its own reduction functions (i.e., User-Defined Reduction).

First, create a struct to store the two relevant values:

struct MyMax {
   int max;
   int index;
};

then we need to teach the OpenMP implementation how to reduce it:

#pragma omp declare reduction(maximo : struct MyMax : omp_out = omp_in.max > omp_out.max ? omp_in : omp_out)

we set our parallel region accordingly:

    #pragma omp parallel for reduction(maximo:myMaxStruct)
    for(int i = 0; i< 10; i++){
       if (v[i] > myMaxStruct.max){
          myMaxStruct.max = v[i];
          myMaxStruct.index = i + 100;
      }
   }

Side Note You do not really need private(i), because with the #pragma omp parallel for the index variable of the for loop will be implicitly private anyway.

All put together:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

struct MyMax {
   int max;
   int index;
};


int main(void)
{
    #pragma omp declare reduction(maximo : struct MyMax : omp_out = omp_in.max > omp_out.max ? omp_in : omp_out)
    struct MyMax myMaxStruct;
    myMaxStruct.max = 0;
    myMaxStruct.index = 0;

    int v[10] = {2,9,1,3,5,7,1,2,0,0};

    #pragma omp parallel for reduction(maximo:myMaxStruct)
    for(int i = 0; i< 10; i++){
       if (v[i] > myMaxStruct.max){
          myMaxStruct.max = v[i];
          myMaxStruct.index = i + 100;
      }
   }
   printf("Max %d : Index %d\n", myMaxStruct.max, myMaxStruct.index);
}

OUTPUT:

Max 9 : Index 101

(Index is 101 because you have b = i + 100)

score 0 · Answer 2 · edited Mar 18 '21 at 18:32

I've coded this but not compiled or tested it:

int v[10] = { 2, 9, 1, 3, 5, 7, 1, 2, 0, 0 };

int maximo = 0;
int b = 0;
int i;

int nt = omp_get_num_threads();
int bv[nt] = { 0 };

#pragma omp parallel for shared(v) shared(bv) private(i) reduction(max:maximo)
for (i = 0; i < 10; i++) {
    if (v[i] > maximo) {
        maximo = v[i];
        bv[omp_get_thread_num()] = i + 100;
    }
}

for (i = 0;  i < nt;  ++i)
    printf("bv[%d] = %d\n",i,bv[i]);

Beware that "Returns the number of threads in the current team. In a sequential section of the program omp_get_num_threads returns 1"

Okay, I've recoded it [and built/run it] and it does produce one non-zero bv output:

#include <stdio.h>
#include <omp.h>

int
main(void)
{
    int v[10] = { 2, 9, 1, 3, 5, 7, 1, 2, 0, 0 };

    int i;
    int nt;
    int maximo = 0;
    int index = 0;
    int bv[32] = { 0 };
    int max[32] = { 0 };
    #pragma omp parallel shared(v, bv)
    {
        nt = omp_get_num_threads();
        int thread_id = omp_get_thread_num();
        #pragma omp for private(i)
        for (i = 0; i < 10; i++) {
            if (v[i] > max[thread_id]) {
               max[thread_id] = v[i];
               bv[thread_id] = i + 100;
            }
        }
    }
    // Reducing sequentially 
    for (i = 0;  i < nt;  ++i){
        if(max[i] > maximo){
           maximo = max[i];
           index  = bv[i];
        }
    }
    printf("Max %d at index %d\n", maximo, index);
    return 0;
}

Here is the program output:

Max 9 at index 101

Hi, I have corrected some of problems on the other code, feel free to rollback if you want. With nested parallelism you would add too much unnecessary overhead. You can get the number of correctly just with a single parallel region. — dreamcrash, Mar 18 '21 at 18:34
Btw I did not downvoted otherwise I would have removed by now — dreamcrash, Mar 18 '21 at 18:35

Reducing the max value and saving its index

2 Answers2

Linked