0

I have an experimental app that I am developing that adds image filenames to a collection. I am attempting to find the most efficient way to delete all the files in a collection except for files that exist in another collection. Files can exist in any collection.

I have a TClientDataSet with the following fields:

ClientDataSet1.FieldDefs.Add('Index', ftInteger);
ClientDataSet1.FieldDefs.Add('Collection', ftString, 50);
ClientDataSet1.FieldDefs.Add('Filename', ftString, 254);

I came up with this which seems to work but seems inefficient:
var
  i: Integer;
  j: Integer;
  iCollectionToDelete: string;
  iCollection: string;
  iFilename: string;
  iFilenameInOtherCollection: string;
  iFilesInOtherCollectionsStringList: TStringList;
begin
  iCollectionToDelete := ListBox1.Items[ListBox1.ItemIndex];
  { Set filtered to false to see all the records in the database }
  ClientDataSet1.Filtered := False;
  { Create a list of files not in the collection to be deleted }
  iFilesInOtherCollectionsStringList := TStringList.Create;
  try
    for i := 0 to ClientDataSet1.RecordCount - 1 do
    begin
       iCollection := ClientDataSet1.FieldByName('Collection').AsString;
       iFilename := ClientDataSet1.FieldByName('Filename').AsString;
       if iCollection <> iCollectionToDelete then
       begin
          iFilenameInOtherCollection := ClientDataSet1.FieldByName('Filename').AsString;
          iFilesInOtherCollectionsStringList.Add(iFilename);
       end;
       ClientDataSet1.Next;
    end;

    { Compare the iFilenameInOtherCollection with all the filenames in the
      dataset and if the iFilename is not in the other collection then
      erase the file }
    ClientDataSet1.First;
    for i := 0 to ClientDataSet1.RecordCount - 1 do
    begin
       iFilename := ClientDataSet1.FieldByName('Filename').AsString;
       ClientDataSet1.Next;
       for j := 0 to iFilesInOtherCollectionsStringList.Count - 1 do
       begin
          iFilenameInOtherCollection := iFilesInOtherCollectionsStringList[j];
          if iFilesInOtherCollectionsStringList.IndexOf(iFilename) = -1 then
            if FileExists(iFilename) then
             WindowsErase(handle, iFilename, False, False, False, False);
       end;
    end;
  finally
    iFilesInOtherCollectionsStringList.Free;
  end;
end;

My question is can this be made more efficient or is there a way to do the same thing using just TClientDataset methods?

Bill
  • 2,993
  • 5
  • 37
  • 71
  • Something like this ? `for i := 0 to FilenamesToKeep.Count - 1 do begin j := FilenamesToDelete.IndexOf(FilenamesToKeep[i]); if j>-1 then FilenamesToDelete.Delete(j); end;` – bummi Oct 28 '14 at 15:28
  • Perhaps the down-voter would care to explain ... – MartynA Oct 28 '14 at 21:09

2 Answers2

1

Just add iFilesInOtherCollectionsStringList.Sorted := True after you fill it. IndexOf will then use fast binary search instead of extremely slow one-by-one looping. Probably that will be enough for your purposes.

Another option is to prepare list-to-delete first and then launch a worker thread which will execute removal in the background. This will likely help because usually file operations much more slow than memory comparisons. You may check if it is the deletion that slows down your process by commenting out the WindowsErase line.

Fr0sT
  • 2,959
  • 2
  • 25
  • 18
  • 1
    +1 for two good suggestions. .Sorted := True is easily overlooked and trying to delete the file in the main thread will make it hang if some other process coincidentally has it open. – MartynA Oct 28 '14 at 21:07
  • @MartynA +1 for busy files case. – Fr0sT Oct 29 '14 at 07:10
1

Just for amusement, I thought I'd try doing this without using Stringlists at all, and instead use a couple of features of ClientDataSets, namely Filtering and the ability to copy data from one CDS to another in a single statement. It's quite a bit shorter than using stringlists and probably easier to maintain/modify/refactor as a result.

I've not benchmarked it against the Stringlist version but would be surprised if it were any faster, because it depends on TClientDataSet.Locate, which is not a particularly efficient operation even when working against an indexed field.

The code is below. Hopefully the comments will explain how it works.

procedure TForm1.SetUp;
begin
  ClientDataSet1.FieldDefs.Add('Index', ftInteger);
  ClientDataSet1.FieldDefs.Add('Collection', ftString, 50);
  ClientDataSet1.FieldDefs.Add('Filename', ftString, 254);
  ClientDataSet1.CreateDataSet;

  // Create some test data
  ClientDataSet1.InsertRecord([1, 'C1', 'F1']);
  ClientDataSet1.InsertRecord([2, 'C2', 'F1']);
  ClientDataSet1.InsertRecord([3, 'C3', 'F1']);
  ClientDataSet1.InsertRecord([4, 'C1', 'F2']);
  ClientDataSet1.InsertRecord([5, 'C3', 'F3']);
end;


procedure Tform1.ApplyCDSFilter(CDS : TClientDataSet; FilterExpr : String);
// utility routine to filter/unfilter a dataset
begin
  CDS.Filtered := False;
  CDS.Filter := FilterExpr;
  if FilterExpr <> '' then
    CDS.Filtered := True;
end;

procedure TForm1.RemoveFilesOnlyInCollection(CollectionName : String);
var
  CDS : TClientDataSet;
  FilterExpr : String;
  AFileName : String;
begin
  // In the following, I'm just going to add the names of the files which belong to the
  // specified collection as well as to another one
  // to a listbox so as to be able to check the results by inspection

  Listbox1.Items.Clear;

  // next create a temporary CDS
  CDS := TClientDataSet.Create(Nil);

  // Index it by Filename
  CDS.IndexFieldNames := 'Filename';

  // Copy the data from ClientDataSet1 into it
  CDS.Data := ClientDataSet1.Data;

  // Construct a filter expression to select the collection whose members are to be
  // retained.  NOTE : the QuotedStr is to handle quotes embedded in the collection name.
  FilterExpr := '(Collection =' + QuotedStr(CollectionName) + ')';

  //  Apply the filter to ClientDataSet1, so that only records that contain the CollectionName 
  //  are "visible", temporarily
  ApplyCDSFilter(ClientDataSet1, FilterExpr);

  //  Next, negate the filter expression and apply it to the temporary CDS
  FilterExpr := 'not ' + FilterExpr;
  ApplyCDSFilter(CDS, FilterExpr);

  //  Now, we can loop through ClientDataSet1 and test whether the Filename is present
  //  in the temporary CDS.  If it is, that means that the Filename belongs to another  
  // collection too.

  try
    ClientDataSet1.DisableControls;
    ClientDataSet1.First;
    while not ClientDataSet1.Eof do begin
      AFileName := ClientDataSet1.FieldByName('Filename').AsString;
      if not CDS.Locate('Filename', AFileName, [loCaseInsensitive]) then
        Listbox1.Items.Add(AFileName);
      ClientDataSet1.Next;
    end;
   finally
     CDS.Free;
     ClientDataSet1.EnableControls;
     ApplyCDSFilter(ClientDataSet1, '');
   end;
end;
MartynA
  • 30,454
  • 4
  • 32
  • 73
  • Thanks for the suggestion. I think I probably will wind up using the ClientDataSet procedures rather than string lists, just to keep things simple as possible. – Bill Oct 28 '14 at 20:09
  • Glad it helped. Btw, you could avoid some of the overhead of doing a .Locate by using .GotoKey, because that avoids the construction and disposal of a parser object as happens on each call to .Locate. – MartynA Oct 28 '14 at 20:15