Using Free Pascal\Lazarus to parse a large binary file for specific values

Question

I need to parse a RAM dump for MFT records (from the NTFS filesystem).

I've done some programming in the past with regard to reading headers of multiple files (using FileSearcher class, etc.), but I'm not entirely sure how to start reading from the start of a large file, read through it, and when a certain value is found I need to read in 1024 bytes from the point that the magic value is found (FILE0, in the case of MFT entries) and 'do stuff' with the values in between that and the end of the 1024 byte range. It then needs to carry on searching for the next FILE0 record.

So far I have the following - my intention is that it reads through the source file (which is a TFileStream) looking for 'FILE0'. When it finds it, at this stage I just want it to report that it has found a record and output the position, but in due course I need it to then read a series of bytes from the point at which FILE0 was found:

type
    MFTRecordsStore = packed record
    FILE0MagicMarker: array[0..4] of byte;
    // Lots more follow....
end;

var
    MFTHeaderArray : MFTRecordsStore;
    FILE0Present : string;
    i : integer;

begin
    SourceFile.Position := 0;
    while (SourceFile.Position < SourceFile.Size) do
        begin
            SourceFile.ReadBuffer(MFTHeaderArray, SizeOf(MFTHeaderArray));
            for i := 0 to 4 do
                FILE0Present := FILE0Present + IntToHex(MFTHeaderArray.FILE0MagicMarker[i], 2);
                if FILE0Present = 'FILE0' then
                    begin
                        Memo1.Lines.Add('FILE0 Entry found at '+ IntToStr(SourceFile.Position));
                    end;
        end;
  end;

This code compiles and runs (it starts to parse the file), but after several minutes of heavy CPU usage the program crashes and reports that it cannot read the stream. I have a feeling this has something to do with getting to the end of the file and there not been a full 'chunk' left to read so it crashes?

What is the solution?

score 5 · Accepted Answer · edited May 23 '17 at 12:04

5

I'm posting an example of how I would write and read file of records using streams and seek for a certain ANSI text in it. You may check also the commented version of this post.

Here is the record definition used in this example:

type
  TFileRecord = packed record
    Marker: array [0..4] of Byte;
    Width: Integer;
    Height: Integer;
    Useful: Boolean;
  end;

Here is how to create such file of records (what you already have :)

procedure TForm1.Button1Click(Sender: TObject);
var
  FileStream: TFileStream;
  FileRecord: TFileRecord;
const
  RecordSize = SizeOf(TFileRecord);

  procedure FillFileRecord(const AMarker: string; const AWidth: Integer;
    const AHeight: Integer; const AUseful: Boolean);
  begin
    FillChar(FileRecord, RecordSize, 0);
    Move(AMarker[1], FileRecord.Marker, Length(FileRecord.Marker));
    FileRecord.Width := AWidth;
    FileRecord.Height := AHeight;
    FileRecord.Useful := AUseful;
  end;

begin
  FileStream := TFileStream.Create('File.dat', fmCreate);
  try
    FillFileRecord('FILE1', 111, 112, False);
    FileStream.Write(FileRecord, RecordSize);
    FillFileRecord('FILE2', 211, 212, False);
    FileStream.Write(FileRecord, RecordSize);
    FillFileRecord('FILE3', 311, 312, False);
    FileStream.Write(FileRecord, RecordSize);
    FillFileRecord('FILE4', 411, 412, False);
    FileStream.Write(FileRecord, RecordSize);
    FillFileRecord('FILE0', 666, 777, True);
    FileStream.Write(FileRecord, RecordSize);
    FillFileRecord('FILE5', 511, 512, False);
    FileStream.Write(FileRecord, RecordSize);
    FillFileRecord('FILE0', 11111, 22222, True);
    FileStream.Write(FileRecord, RecordSize);
    FillFileRecord('FILE6', 611, 612, False);
    FileStream.Write(FileRecord, RecordSize);
  finally
    FileStream.Free;
  end;
end;

And here is how to read such file:

procedure TForm1.Button2Click(Sender: TObject);
var
  FileStream: TFileStream;
  FileRecord: TFileRecord;
const
  HeaderSeq = 'FILE0';
  HeaderLen = Length(HeaderSeq);
  RecordSize = SizeOf(TFileRecord);
begin
  FileStream := TFileStream.Create('File.dat', fmOpenRead);
  try
    while FileStream.Read(FileRecord, RecordSize) = RecordSize do
    begin
      if CompareMem(@HeaderSeq[1], @FileRecord.Marker[0], HeaderLen) then
      begin
        Memo1.Lines.Add('FILE0 entry found at '+
          IntToStr(FileStream.Position - RecordSize));
        Memo1.Lines.Add('FileRecord.Width = ' +
          IntToStr(FileRecord.Width));
        Memo1.Lines.Add('FileRecord.Height = ' +
          IntToStr(FileRecord.Height));
        Memo1.Lines.Add('FileRecord.Useful = ' +
          BoolToStr(FileRecord.Useful, True));
      end;
    end;
  finally
    FileStream.Free;
  end;
end;

edited May 23 '17 at 12:04

Community

1
1

answered Mar 31 '12 at 04:26

TLama

75,147
17
214
392

2

There a people in the world who need thanking for their time and efforts. You're on of them. It's very good of you to take what I imagine was a lot of time to prepare that! I will look at working it out and use it either wholly or in part and see how I get on. My thanks again to you for your helpfulness. – Gizmo_the_Great Mar 31 '12 at 16:34
See the [`commented version`](http://stackoverflow.com/revisions/9953014/5). IMHO there's nothing more to optimize or simplify here. If you replace e.g. [`Read`](http://lazarus-ccr.sourceforge.net/docs/rtl/classes/tstream.read.html) with [`ReadBuffer`](http://lazarus-ccr.sourceforge.net/docs/rtl/classes/tstream.readbuffer.html) the only thing you'll get is the error message raised when your file record block is corrupted (for some unexpected reason). – TLama Mar 31 '12 at 18:08
Hey TLama....I must confess that I struggled a bit with this and attempted a short cut (as per my other question), but I am already realising that for this task, I have to put the effort in at the start and then everything else will fall into place later. So I have spent time reading your commented version again this evening and will have a bash at implementing it over the Easter break. Thanks. – Gizmo_the_Great Apr 04 '12 at 23:01
There is no TFileRecord in Lazarus\FreePascal. I'm not sure what the equivalent might be. – Gizmo_the_Great Apr 06 '12 at 12:59
It is the record definition I've used in the example. I have `TFileRecord`, you `MFTRecordsStore` ;-) There is the unwritten convention to name the types with starting `T` char, so in your case it would be `TMFTRecordsStore`. – TLama Apr 06 '12 at 13:34
1

TLama...why did I ever doubt your solution! Having sat for a while to try and understand it all and having yreceived your explanation above, it works! I've only used a small sample of the total code, but what I have now does what other examples have not and that is a) return the find of FILE0 and b) it's offset in the original file (as opposed to that of the buffer) and c) outputs that info! (the only snag is that it reliably finds the entries but position reported is always 54 bytes further than where it actually is? Regardless, thanks once again for your help! – Gizmo_the_Great Apr 06 '12 at 19:00
TLama - I figured it out (I think). The position will always be at the end of the size of the buffer. So if the buffer is 54 bytes, and the entry was found in that buffer segment, it reports it has found it "at the end" of the buffer. For now, I have just done a SourceFilePositionLessBufferSize := SourceFile.Position - ArraySize; and that reliably now works every time! I'm not sure if it is cheating though? – Gizmo_the_Great Apr 06 '12 at 19:28
The stream reading begins at the 0 position (what is set after `TFileStream.Create`) and each subsequent call of the `TFileStream.Read` will read (from my example) `RecordSize` of bytes and increase the position by the value of `RecordSize`. So yes, you are right, the `TFileStream.Position` may report the end of the file in my example. So to use something like `RealPosition := SourceFile.Position - ArraySize;` is definitely right (it's me who has a bug in this post, I'll fix it). Thanks for pointing this out and sorry for misleading :-) – TLama Apr 06 '12 at 19:47
TLama - it makes a change that I help spot something! One other comment - as it stands, the code reads the source file in very small segments of 5 bytes which can be fairly slow. How can I increase the buffer size to say 1Mb without losing the accuracy\effectiveness of the code? Sorry to ask but this is still very new to me. – Gizmo_the_Great Apr 09 '12 at 07:56
Hi there, about slow reading you are right, it will be more efficient to read a large block and work with it from memory. Just one question; will your files always consist only from the records aligned by the same size ? I'm asking because of thinking about array of those records. – TLama Apr 09 '12 at 10:41
"will your files always consist only from the records aligned by the same size?" - MFT records are usually (and typically) 1024 bytes in total but they can sometimes be larger. The magic marker 'FILE0' always appears in the first 5 bytes of such a record, though. So what I am aiming to do is parse the dump file that could be Gb's in size in buffers of say 10Mb, look for 'FILE0' entries, and for each entry, read 1019 bytes from that point. That is the long term aim! – Gizmo_the_Great Apr 09 '12 at 11:11
Could you join [`this room`](http://chat.stackoverflow.com/rooms/9853/mft-record-files-and-their-parsing-in-lazarus) to further discussion ? – TLama Apr 09 '12 at 11:19

user323094 · Answer 2 · 2012-03-30T14:19:19.817

2

If you really suspect reading past EOF, try:

while (SourceFile.Position + SizeOf(MFTHeaderArray) <= SourceFile.Size) do

edited Mar 30 '12 at 14:19

answered Mar 30 '12 at 14:10

user323094

3,643
4
21
30

1

Hi. Yes, that has stopped the error, thanks. The code still doesn't output anything but I think that's due to an error elsewhere and my question was to find a resolve to the stream error, which is now solved thanks to your suggestion. – Gizmo_the_Great Mar 31 '12 at 08:52
D'oh, I was preparing the whole example (it's already posted here to review in deleted state). So just one note, be careful when you use `ReadBuffer`, you can get the error raised, when the block to be read doesn't contain enough data. I did this for untyped files, what means that the header and the *packet* might be everywhere in the file, thus the data wouldn't need to be packet aligned. – TLama Mar 31 '12 at 10:02
1

Please do post your example if you have done it! It is bound to be better than my effort!! There is no link in your comment? So if you could add it I'd be obliged. – Gizmo_the_Great Mar 31 '12 at 10:31

Using Free Pascal\Lazarus to parse a large binary file for specific values

2 Answers2