I have a big vector of items that are sorted based on one of their fields, e.g. a cost attribute, and I want to do a bit of processing on each of these items to find the maximum value of a different attribute... The constraint here is that we cannot use an item to calculate a maximum value if that item's cost exceeds some arbitrary price.
The single threaded for-loop looks like this:
auto maxValue = -MAX_FLT;
for(const auto& foo: foos) {
// Break if the cost is too high.
if(foo.cost() > 46290) {
break;
}
maxValue = max(maxValue , foo.value());
}
I've been able to somewhat convert this into a parallel_for_each. (Disclaimer: I'm new to PPL.)
combinable<float> localMaxValue([]{ return -MAX_FLT; });
parallel_for_each(begin(foos), end(foos), [&](const auto& foo) {
// Attempt to early out if the cost is too high.
if(foo.getCost() > 46290) {
return;
}
localMaxValue.local() = max(localMaxValue.local(), foo.getValue());
}
auto maxValue = localMaxValue.combine(
[](const auto& first, const auto& second) {
return max<float>(first, second);
});
The return statement inside the parallel_for feels inefficient since it's still executing over every item, and in this case, it's quite possible that the parallel_for could end up iterating over multiple portions of the vector that are costed too high.
How can I take advantage of the fact that the vector is already sorted by cost?
I looked into using a cancellation token, but that approach seems incorrect as it would cause all sub tasks of the parallel_for to be cancelled which means I may get the wrong maximum value.
Is there something like a cancellation token that could cancel that specific sub task of the parallel_for, or is there a better tool than the parallel_for in this case?