1

I know TTask and have used TTask.WaitForAll(array) successfully, as well as TParallel.&For().

But now I want to do a seemingly simple thing and don't find out how:

I have an unknown number of items coming in, this can be millions or only a few, and I don't know in advance. How can I work on them in parallel (just about 4 threads or so), but without a queue? If max threads are already busy, I want to wait for the next free slot. Something like a TTask.Run() which just doesn't come back until it really starts running.

I guess I'm just overseeing something simple...?

When I'm through, I want to wait for all remaining tasks to finish. But of course I don't want to have millions of them in an array for WaitForAll().


I can imagine a possible solution (but I don't like it and hope for a much easier one using TTask or similar):

  • Push the work to a TThreadedQueue, it would automatically let me wait if the queue is full
  • Start 4 threads and let them pop from that queue in a loop

I know this might be the preferred way anyway in some cases, but my situation would not profit from it (like reusing any objects, connections or so).


Pseudocode of what would be nice and clean:

MyThreadPool:= TMyThreadPool.Create(4);
while GetNextItem(out Item) do
  //the following comes back when it has really been started:
  MyThreadPool.Run(procedure begin Work(Item); end);
MyThreadPool.WaitFor;
maf-soft
  • 2,335
  • 3
  • 26
  • 49
  • You only have two choices. Make a `TTask` for each record and let `TThreadPool` handle the throttling for you, or you queue the records and throttle them yourself. A thread-safe queue would work just fine, and still allow you to reuse objects. It is just a matter of how you code the logic for them. I would use a pool of record objects, a poll of connection objects, etc. Pull out a record, fill it, put it in the queue. When a thread pulls the record from the queue, pull out a connection and use it, then push record and connection back into their pools. Repeat until queue is empty – Remy Lebeau Feb 02 '21 at 15:41
  • Hi @RemyLebeau, since it's not about DB records (I'm processing files in this case), there is nothing I could reuse. So I was hoping for some simple and short code like `TParallel.@For` would be. Any other idea? Can I get the size of the internal TTask queue? – maf-soft Feb 02 '21 at 15:57
  • Interesting related question: https://stackoverflow.com/questions/51571759/what-is-the-purpose-of-tworkstealingqueue-and-how-to-use-it – maf-soft Feb 02 '21 at 17:40

1 Answers1

2

This seems to be a working solution, but abusing TParallel.&For is maybe not really nice. I'm still hoping for a better answer.

if FindFirst(Path, 0, SearchRec) = 0 then
  try
    TParallel.&For(0, 99999,
      procedure(I: Integer; LoopState: TParallel.TLoopState)
      var
        Filename: string;
      begin
        if LoopState.ShouldExit then Exit;

        TMonitor.Enter(Self);
        try
          Filename:= SearchRec.Name;
          if (FindNext(SearchRec) <> 0) or TThread.CheckTerminated then
            LoopState.Stop; //or .Break?
        finally
          TMonitor.Exit(Self);
        end;

        try
          ProcessFile(Filename);
        except
          on E: Exception do Log(E.ToString); //maybe also want to Stop
        end;
      end);
  finally
    FindClose(SearchRec);
  end;

I wrote a lot of trace logs and it looks good. The only bad thing is that after the last file it still starts 10-20 more executions which are then exited at the beginning.

It also seems the default threadpool cannot be restricted to less than the number of processors.

Please comment if you think anything is bad or can/should be improved.

maf-soft
  • 2,335
  • 3
  • 26
  • 49