7

I have a "Find Files" function in my program that will find text files with the .ged suffix that my program reads. I display the found results in an explorer-like window that looks like this:

enter image description here

I use the standard FindFirst / FindNext methods, and this works very quickly. The 584 files shown above are found and displayed within a couple of seconds.

What I'd now like to do is add two columns to the display that shows the "Source" and "Version" that are contained in each of these files. This information is found usually within the first 10 lines of each file, on lines that look like:

1 SOUR FTM
2 VERS Family Tree Maker (20.0.0.368)

Now I have no problem parsing this very quickly myself, and that is not what I'm asking.

What I need help with is simply how to most quickly load the first 10 or so lines from these files so that I can parse them.

I have tried to do a StringList.LoadFromFile, but it takes too much time loading the large files, such at those above 1 MB.

Since I only need the first 10 lines or so, how would I best get them?

I'm using Delphi 2009, and my input files might or might not be Unicode, so this needs to work for any encoding.


Followup: Thanks Antonio,

I ended up doing this which works fine:

var
  CurFileStream: TStream;
  Buffer: TBytes;
  Value: string;
  Encoding: TEncoding;

try
  CurFileStream := TFileStream.Create(folder + FileName, fmOpenRead);
  SetLength(Buffer, 256);
  CurFileStream.Read(Buffer[0], 256);
  TEncoding.GetBufferEncoding(Buffer, Encoding);
  Value := Encoding.GetString(Buffer);
  ...
  (parse through Value to get what I want)
  ...
finally
  CurFileStream.Free;
end;
lkessler
  • 19,819
  • 36
  • 132
  • 203
  • TStrings.LoadFromFile is very inefficient, forget about it. Think out of the box and read reasonable (eg: NumLines * AvgLineLength) number of bytes, truncate with LineStart and then split into TStrings – Free Consulting Jan 31 '11 at 06:23
  • Actually, Worm, it's not as bad as you'd think. It can read and load about 10 MB a second. I still successfully use it when I have to search for texts in those files. But why use it to load entire files and make the user wait 40 seconds when only need the first few lines were needed. – lkessler Feb 01 '11 at 03:36

5 Answers5

14

Use TFileStream and with Read method read number of bytes needed. Here is the example of reading bitmap info that is also stored on begining of the file.

http://www.delphidabbler.com/tips/19

Antonio Bakula
  • 20,445
  • 6
  • 75
  • 102
  • 4
    +1 I would use a TFileStream for this since it wraps up the native OS file API very nicely. – David Heffernan Jan 30 '11 at 20:57
  • 5
    +1. Simply read the first 4 Kbytes of data: That's probably enough to fully contain the first few lines, and it's the minimum amount of data that gets read from disk any way. If you're reading from many files (and 584 files is not exactly "many"), and you want to get fancy, you might want to open the files without caching, using CreateFile, and passing the Handle to THandleStream: it might provide a tiny small amount of improvement because the OS knows not to cache data that is very likely not going to be requested again. – Cosmin Prund Jan 31 '11 at 08:14
  • 2
    TFileStream lacks a readLn capability. What if probably isn't good enough? – Warren P Feb 01 '11 at 00:57
  • Remy's suggestion is a TStreamReader wrapped over a TFileStream, which seems the right way to do it. – Warren P Feb 01 '11 at 01:44
4

Just open the file yourself for block reading (not using TStringList builtin functionality), and read the first block of the file, and then you can for example load that block to a stringlist with strings.SetText() (if you are using block functions) or simply strings.LoadFromStream() if you are loading your blocks using streams.

I would personally just go with FileRead/FileWrite block functions, and load the block into a buffer. You could also use similair winapi functions, but that's just more code for no reason.

OS reads files in blocks, which are at least 512bytes big on almost any platform/filesystem, so you can read 512 bytes first (and hope that you got all 10 lines, which will be true if your lines are generally short enough). This will be (practically) as fast as reading 100 or 200 bytes.

Then if you notice that your strings objects has only less than 10 lines, just read next 512 byte block and try to parse again. (Or just go with 1024, 2048 and so on blocks, on many systems it will probably be as fast as 512 blocks, as filesystem cluster sizes are generally larger than 512 bytes).

PS. Also, using threads or asynchronous functionality in winapi file functions (CreateFile and such), you could load that data from files asynchronously, while the rest of your application works. Specifically, the interface will not freeze during reading of large directories.

This will make the loading of your information appear faster, (since the file list will load directly, and then some milliseconds later the rest of the information will come up), while not actually increasing the real reading speed.

Do this only if you have tried the other methods and you feel like you need the extra boost.

Cray
  • 2,396
  • 19
  • 29
3

You can use a TStreamReader to read individual lines from any TStream object, such as a TFileStream. For even faster file I/O, you could use Memory-Mapped Views with TCustomMemoryStream.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
2

Okay, I deleted my first answer. Using Remy's first suggestion above, I tried again with built-in stuff. What I don't like here is that you have to create and free two objects. I think I would make my own class to wrap this up:

var
  fs:TFileStream;
  tr:TTextReader;
  filename:String;
begin
  filename :=  'c:\temp\textFileUtf8.txt';
  fs := TFileStream.Create(filename, fmOpenRead);
  tr := TStreamReader.Create(fs);
  try
      Memo1.Lines.Add( tr.ReadLine );

  finally
    tr.Free;
    fs.Free;
  end;   
end;

If anybody is interested in what I had here before, it had the problem of not working with unicode files.

Warren P
  • 65,725
  • 40
  • 181
  • 316
  • Thanks for the alternative, Warren. I had already managed to implement TFileStream as Antonio suggested, and it's working well enough that I don't have to try anything else. I'll remember this as an alternative, though. – lkessler Feb 01 '11 at 03:31
  • +1 for better solution because of ReadLine, but I am not sure that this is *faster* – Antonio Bakula Feb 01 '11 at 16:14
  • TStreamReader has several constructors that let you specify a filename instead of a separate TStream object pointer. – Remy Lebeau Feb 01 '11 at 20:05
0

Sometimes oldschool pascal stylee is not that bad. Even though non-oo file access doesn't seem to be very popular anymore, ReadLn(F,xxx) still works pretty ok in situations like yours.

The code below loads information (filename, source and version) into a TDictionary so that you can look it up easily, or you can use a listview in virtual mode, and look stuff up in this list when the ondata even fires.

Warning: code below does not work with unicode.

program Project101;
{$APPTYPE CONSOLE}

uses
  IoUtils, Generics.Collections, SysUtils;

type
  TFileInfo=record
    FileName,
    Source,
    Version:String;
  end;

function LoadFileInfo(var aFileInfo:TFileInfo):Boolean;
var
  F:TextFile;
begin
  Result := False;
  AssignFile(F,aFileInfo.FileName);
  {$I-}
  Reset(F);
  {$I+}
  if IOResult = 0 then
  begin
    ReadLn(F,aFileInfo.Source);
    ReadLn(F,aFileInfo.Version);
    CloseFile(F);
    Exit(True)
  end
  else
    WriteLn('Could not open ', aFileInfo.FileName);
end;

var
  FileInfo:TFileInfo;
  Files:TDictionary<string,TFileInfo>;
  S:String;
begin
  Files := TDictionary<string,TFileInfo>.Create;
  try
    for S in TDirectory.GetFiles('h:\WINDOWS\system32','*.xml') do
    begin
      WriteLn(S);
      FileInfo.FileName := S;
      if LoadFileInfo(FileInfo) then
        Files.Add(S,FileInfo);
    end;

    // showing file information...
    for FileInfo in Files.Values do
      WriteLn(FileInfo.Source, ' ',FileInfo.Version);
  finally
    Files.Free
  end;
  WriteLn;
  WriteLn('Done. Press any key to quit . . .');
  ReadLn;
end.
Wouter van Nifterick
  • 23,603
  • 7
  • 78
  • 122