Is there a faster alternative for computing the argmin in OpenACC, than splitting the work in a minimum-reduction loop and another loop to actually find the index of the minimum?
This looks very wasteful:
float minVal = std::numeric_limits<float>::max();
#pragma acc parallel loop reduction(min: minVal)
for(int i = 0; i < arraySize; ++i) {
minVal = fmin(minVal, array[i]);
}
#pragma acc parallel loop
for(int i = 0; i < arraySize; ++i) {
if(array[i] == minVal){
minIndex = i;
}
}
In fact, this became a bottleneck for my current project.