You can use OmniThreadLibrary to process records from a database in parallel without much hassle.
I wrote an example using the Pipeline abstraction. The pipeline consts of 3 stages:
- The first stage reads data from the database, creates a instance of the container object to represent that data for the next stage of the pipeline.
The second stage processes the incoming data.
- calls the
DoSomethingWith
procedure that simply wastes around 100 ms. to simulate the processing of the data
- frees the memory of the container instance.
- Then adds the literal value
1
to the output queue to inform the final stage that another record has been processed.
This stage is configured to run in parallel in 7 threads.
- The last stage just counts how many records has been completed from the previous stage
The example is a console application to allow you just copy/paste to see it working live in your machine.
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils,
OtlCommon,
OtlCollections,
OtlParallel,
System.Diagnostics,
DB, DBClient;
type
//auxiliar container, used to copy the database data
//to avoid synchronization. remember TDataSet "current record"
//may cause conflicts if changed from different threads.
TContainer = class
private
FName: string;
FID: Int64;
public
property ID: Int64 read FID write FID;
property Name: string read FName write FName;
end;
//does nothing, but wastes around 100ms. "processing" each record
procedure DoSomethingWith(const AValue: TContainer);
begin
Sleep(100);
end;
//creates a DataSet on the fly with a random number of records
function CreateDataSet: TClientDataSet;
var
I: Integer;
begin
Result := TClientDataSet.Create(nil);
with Result.FieldDefs.AddFieldDef do
begin
Name := 'ID';
DataType := ftLargeint;
end;
with Result.FieldDefs.AddFieldDef do
begin
Name := 'NAME';
DataType := ftString;
end;
Result.CreateDataSet;
for I := 1 to Random(1000) do
Result.InsertRecord([I, 'Test']);
end;
var
RecordsProcessed: Integer;
SW: TStopwatch;
Data: TDataSet;
begin
IsMultiThread := True;
Randomize;
Writeln('wait while processing...');
SW := TStopwatch.Create;
SW.Start;
try
Data := CreateDataSet;
try
RecordsProcessed := Parallel.Pipeline
.Stage(
procedure (const Input, Output: IOmniBlockingCollection)
var
RecData: TContainer;
begin
Data.First;
while not Data.Eof do
begin
RecData := TContainer.Create;
RecData.ID := Data.Fields[0].AsLargeInt;
RecData.Name := Data.Fields[1].AsString;
Output.Add(RecData);
Data.Next;
end;
end)
.Stage(
procedure (const Input: TOmniValue; var Output: TOmniValue)
begin
//process the real thing here
DoSomethingWith(Input);
Input.AsObject.Free;
Output := 1; //another record
end)
.NumTasks(7) //this stage is processed by 7 parallel tasks
.Stage(
procedure (const Input, Output: IOmniBlockingCollection)
var
Recs: Integer;
Value: TOmniValue;
begin
Recs := 0;
for Value in Input do
Inc(Recs, Value);
Output.Add(Recs);
end)
.Run.Output.Next;
SW.Stop;
Writeln(RecordsProcessed, ' records processed in ', SW.ElapsedMilliseconds, 'ms.');
Writeln('Avg. ', (SW.ElapsedMilliseconds/RecordsProcessed):0:3, 'ms./record');
finally
Data.Free;
end;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
readln;
end.
The main advantages of doing it this way, IMHO, are:
- you have a flexible mechanism to distribute the job between the multiple workers. If some record takes more time to process, the library takes care of the situation and you can reasonably expect to finish the total work in the less possible time.
- You'r first processing thread starts as soon as you finish reading the first record from the database.
- You can easily adapt it if you have to wait for more incoming records in the base table. The output queue of the stage will not be marked as finished until the code in the stage procedure ends. If at some time there's no more work to do, all the upcoming stages would just block waiting for more data to process.
- You change the number of worker threads just by changing a parameter value!