Find the Maximum in a List of calculated values using the Parallel Programming Library

Question

I have a list of values. I'd like to find the maximum value. This is a common task. A simple version might be:

iBest := -1;
iMax := -1e20;
for i := 0 to List.Count - 1 do
begin
  if List[i].Value > iMax then
  begin
    iBest := i;
    iMax := List[i].Value;
  end;
end;

In my case, the .Value getter is the performance bottleneck as it invokes a time consuming calculation (~100ms) which returns the final value.

How can I make this parallel using the Parallel Programming Library?

Map/reduce. Split the list into parts. Have each task find the max value in one part. Gather together all of those values and find the max value in each part. Unless you have millions of items in your array expect the parallel overhead to dominate. How large are your arrays? You do realise that there are values less than -1e20? There is also a well-defined minimum value for each floating point type. And you can write the algorithm perfectly cleanly without even using that. Initialise the index to `0` and the max to the 0-th value, and loop from 1. — David Heffernan, Oct 11 '16 at 11:34
Thanks @DavidHeffernan - I don't have millions of items. The cost of calculating the value is considerable (about 0.1 seconds) and I maybe have 30,000 items. — Steve Maughan, Oct 11 '16 at 11:57
Something is wrong. I can use your code to find the max value of an array of 30,000 items in 20 microseconds, 5,000 times faster than the speed you report, 100,000 microseconds. Parallel won't help you here. Going parallel won't help you with a task that takes 20 microseconds and needs a join. It will be way slower parallel. Your bottleneck is not where you think it is. In any case, this would need to be your entire program for parallel to help. Read this to understand why: https://en.wikipedia.org/wiki/Amdahl%27s_law — David Heffernan, Oct 11 '16 at 12:14
Your next step is to concentrate on identifying the true bottleneck of your program. You are a long way from that I suspect. It's easy to think that just throwing some parallel at a problem will yield great results. It's way harder than that. You have to really and truly understand your program and its performance characteristics. You need a very deep understanding. — David Heffernan, Oct 11 '16 at 12:18
@DavidHeffernan He noted that the cost to calculate `.Value` is high (0.1s) - this seems like a list of objects where the `.Value` getter is actively calculating itself on access (seemingly not cached or not cacheable). If that's the case then there could well be a case for parallelizing. — J..., Oct 11 '16 at 12:34
"I have a list of values" seemed explicit to me. I assumed calculate the value meant the max. If each item takes 0.1s it's a different ball game. Clearly if they take a long time to evaluate then there is scope. But then I wonder if they can be calculated once and re-used. Which could obviate the need for parallel. The problem with this sort of question is that the naive answer is easy to give bit probably misses the big picture. Algorithm choice is deep. Without detail advice is liable to be poor. — David Heffernan, Oct 11 '16 at 12:50
Sounds like: "Some ad told me that doing things in parallel makes my program faster, so how can I do x in parallel" to me. Analyze your problem first before throwing more complexity and concurrency at it. — Stefan Glienke, Oct 11 '16 at 15:46
@DavidHeffernan the "getter" is indeed the bottleneck.I greatly simplified the problem for the purpose of asking the question. It is a computationally intensive optimization algorithm which takes about 45 minutes to run in. — Steve Maughan, Oct 12 '16 at 07:14
Steve, the problem with your question is that you completely missed the point. Finding the maximum was not the issue. The issue was evaluating each value. Which led us all in the wrong direction. And means you have an accepted answer that bears little relation to the question as asked and requires a deal of educated guesswork from these comments. If you had spent more time concentrating on identifying the bottleneck I think the solution would have been obvious. — David Heffernan, Oct 12 '16 at 07:32
The other way to do this is not to modify the class as J... suggests but to simply allocate an array and copy the values to it in a parallel for loop. Then find the max in the array and indeed do other things with the array. That avoids building internal scaffolding in the class to deal with caching. May be preferable that way, I don't know. Also perhaps depends on whether you use the values more than once. — David Heffernan, Oct 12 '16 at 08:03
@SteveMaughan That was no sarcasm - I just spotted a pattern. Since the introduction of the PPL I see people everywhere that try to throw concurrency on problems without even completely analyzing the real bottleneck. You can find questions here on SO where things turned out to be even slower than before because they did not proper prepare their data or algo for being processed in parallel. — Stefan Glienke, Oct 12 '16 at 08:57
@StefanGlienke It certainly came across as condescending. You jumped to the conclusion I was simply "throwing more complexity and concurrency" at the problem without understanding it. I do understand the problem. — Steve Maughan, Oct 12 '16 at 09:05
So you probably already looked into speeding up the performance of calculating .Value and found that it cannot made any faster. — Stefan Glienke, Oct 12 '16 at 09:15
Yes - the calculation is memory and processor intensive. It's an implementation of an Ant-Colony-Optimization algorithm. — Steve Maughan, Oct 12 '16 at 09:42
If it's memory intensive you might find out that doing that in parallel will not give you the benefit you are hoping for but one can only tell after you tried. — Stefan Glienke, Oct 12 '16 at 09:47
@StefanGlienke It requires lots of memory but the allocation can be done outside of the main loop (via an object pool); leaving the loop to just perform the calculations. But you're certainly correct in saying the scaling will unlikely be linear. Thanks for the input. — Steve Maughan, Oct 12 '16 at 10:02
Still, the title is very misleading. The objective here is not to find the max from a list of values by parallelization, it is to parallelize multiple independent calculations by the same algorithm. All of this in order to gain performance. — LU RD, Oct 12 '16 at 11:44

J... · Accepted Answer · 2016-10-11T13:58:42.227

If the value is a calculated value and you can afford to cache, a simple solution might look something like this:

program Project1;

{$APPTYPE CONSOLE}

uses
  SysUtils, Threading, DateUtils, Math, Generics.Collections, StrUtils;

type
  TFoo = class
  private
    FCachedValue : double;
    function GetValue : double;
  public
    property CalculateValue : double read GetValue;
    property CachedValue : double read FCachedValue;
  end;

  TFooList = class(TObjectList<TFoo>)
    public
      procedure CalculateValues;
      function GetMaxValue(var BestIndex : integer) : double;
  end;


function TFoo.GetValue : double;
begin
  sleep(10);   // simulate taking some time... make up a value
  FCachedValue := DateUtils.MilliSecondOfTheSecond(now);
  result := FCachedValue;
end;

procedure TFooList.CalculateValues;
begin
  TParallel.For(0, Count - 1,
    procedure (j:integer)
    begin
      self[j].CalculateValue;
    end);
end;

function TFooList.GetMaxValue(var BestIndex : Integer) : double;
var
  i, iBest : integer;
  maxval : double;
begin
  CalculateValues;
  iBest := 0;
  maxval := self[0].CachedValue;
  for i := 0 to self.Count - 1 do
  begin
    if self[i].CachedValue > maxval then
    begin
      iBest := i;
      maxval := self[i].CachedValue;
    end;
  end;
  BestIndex := iBest;
  result := maxval;
end;


var
  LFooList : TFooList;
  i, iBest : integer;
  maxval : double;
begin
  LFooList := TFooList.Create(true);
  try
    for i := 0 to 9999 do LFooList.Add(TFoo.Create);
    maxval := LFooList.GetMaxValue(iBest);
    WriteLn(Format('Max value index %d', [iBest]));
    WriteLn(Format('Max value %.6f', [maxval]));
  finally
    LFooList.Free;
  end;
  ReadLn;
end.

This way your object retains a cache of the last calculated value, which you can refresh at any time, but which you can also access quickly. It's somewhat easier to parallelize a full calculation of the list than it is to paralellize the min/max search, and if the bottleneck is the calculation then it makes sense to restrict the added complexity to that operation alone (where you know the overhead is worth it).

Thanks - this is indeed simple, helpful (and not condescendingly sarcastic). — Steve Maughan, Oct 12 '16 at 07:20
@SteveMaughan It is just one example, of course. Like David noted there are many ways you could attack this - it very much depends on how you will be using this data. The critical idea is to separate, as much as possible, the fast and slow parts of the problem and deal with them separately. — J..., Oct 12 '16 at 08:29

Find the Maximum in a List of calculated values using the Parallel Programming Library

1 Answers1