Effectively using Delphi to read unknown-sized blocks from a file

Question

In the past, I have seen this work, but I never really understood how it should be done.
Assume we have a file of known data types, but unknown length, like a dynamic array of TSomething, where

type
  TSomething = class
    Name: String;
    Var1: Integer;
    Var2: boolean;
  end;

The problem, though, is that this object type may be extended in the future, adding more variables (e.g. Var3: String).
Then, files saved with an older version will not contain the newest variables.
The File Read procedure should somehow recognize data in blocks, with an algorithm like:

procedure Read(Path: String)
begin
  // Read Array Size
  //   Read TSomething --> where does this record end? May not contain Var3!
  //   --> how to know that the next data block I read is not a new object?
end;

I have seen this work with BlockRead and BlockWrite, and I assume each object should probably write its size before writing itself in the file, but I would appreciate an example (not necessarily code), to know that I am thinking towards the right direction.

Related readings I have found:
SO - Delphi 2010: How to save a whole record to a file?
Delphi Basics - BlockRead
SO - Reading/writing dynamic arrays of objects to a file - Delphi
SO - How Can I Save a Dynamic Array to a FileStream in Delphi?

score 7 · Accepted Answer · answered Oct 03 '13 at 06:50

In order to make this work, you need to write the element size to the file. Then when you read the file, you read that element length which allows you to read each entire element, even if your program does not know how to understand all of it.

In terms of matching up your record with the on-disk record that's easy enough if your record only contains simple types. In that scenario you can read from the file Min(ElementLength, YourRecordSize) bytes into your record.

But it does not look as though you actually have that scenario. Your record is in fact a class and so not suitable for memory copying. What's more, its first member is a string which is most definitely not a simple type.

Back in the day (say the 1970s), the techniques you described were how files were read. But these days programming has moved on. Saving structured data to files usually means using a more flexible and adaptable serialization format. You should be looking to using JSON, XML, YAML or similar for such tasks.

Thank you David. It is kind of objective, but I think you addressed both reading unknown sizes, and serialization. Thanks. — mavrosxristoforos, Oct 03 '13 at 07:15

score 5 · Answer 2 · answered Oct 03 '13 at 06:57

5

I'd say you need a method of versioning you file. That way you know what version of the record is contained in the file. Write it at the start of the file and then on reading, read in the version identifier first and then use the corresponding structure to read the rest.

answered Oct 03 '13 at 06:57

Pieter B

1,874
10
22

This is the approach I use, and it works very well. A header at the front of the file that contains an ID to identify the file type, a version number to identify the layout of the file and type of record(s) used, and extra flags that control the interpretation of the record data without affecting how the data is read (character encodings, features, etc). – Remy Lebeau Oct 03 '13 at 07:05
@David, so you think that old program needs to know what data are in the file if you completely change the whole meaning of the used structure for instance ? I don't. I would simply tell the user he's out of luck and he should get a newer version of my program. – TLama Oct 03 '13 at 07:13
personally I prefer to write data in a more accesible format, I prefer xml, that way older versions of your program can read newer files. – Pieter B Oct 03 '13 at 07:15
Great. Thanks for all the suggestions. – mavrosxristoforos Oct 03 '13 at 07:16
1

@TLama No I don't think that and never said so. The old file just needs to know **how** to skip the info it does not understand. A single version is not enough. More structure is needed. My program's binary file can be read by ancient versions. – David Heffernan Oct 03 '13 at 07:19
1

A version info in the header is sufficient, as the interpretation of the ta can easily be done in code. It just requires a well organized reader. I would recommend a simple factory which retrieves an adequate reader for a given version number. Obviously and usually, the reader for version X is derived from the reader of version X-1. So there is very little coding in each reader. – alzaimar Oct 03 '13 at 08:21
@alzaimar Version info that describes or encodes how to read the file, and how to skip the unknown content will work. A plain version number is not enough though. – David Heffernan Oct 03 '13 at 08:23
A 'version number' is identical to 'version info'. It might be more readable to use a 'version info', but technically, both attempts identify a unique way of treating (i.e. reading, interpreting and writing) the data. Please read my solution further down. Thx. – alzaimar Oct 03 '13 at 08:31
@alzaimar Your "solution" is identical to this. Adding just a version number won't allow old programs to read files from newer versions of the program. – David Heffernan Oct 03 '13 at 09:35
@DavidHeffernan: in my case, the file version is incremented when the file structure changes in a way that affects reading. All fields are important, so older app versions cannot skip them. The app verifies the file version, prompts the user if the version is not recognized, then adjusts its reading (record types, record sizes, etc) based on which version is being read. Switching character data from Ansi to UTF8, local timestamps to UTC timestamps, etc can be handled with header/record flags since they don't change the reading, only the interpretation of the data. – Remy Lebeau Oct 03 '13 at 15:27

score 1 · Answer 3 · answered Oct 03 '13 at 06:56

If I understand you correctly your main issue is if TSomething changes. Most important thing is that you need to add version info into your file, this you really cannot avoid.

As for actual storage using Sqlite would most likely solve all your problems, but depending on your situation it might be an overkill.

Except for unexceptional circumstances I wouldn't really worry about extending the class too much.If you add add version number to the beginning of the file you can easily convert the file after the class have changed. All you need to do is implement your solution so that adding conversions would as simple as reasonable.

In order to read/write files I would prefer streams/XML/JSON (depending on situation) instead of blockread/blockwrite as you don't have to implement a hack to store version number.

In theory you could also have unused space for each record so I you could avoid recreating entire file if class changes upto a point (until you have enough unused space). It maybe helpful if TSomething changes often and files are big, but most likely I would not go that route.

Thanks for the suggestions. I wouldn't go with SQL for small tasks like this, but I will definitely look into turning it into XML or JSON, probably. — mavrosxristoforos, Oct 03 '13 at 07:18

score 1 · Answer 4 · answered Oct 03 '13 at 08:28

1

This is how I would do it: Include a simple version number in the header. This can be any string, integer or whatever.

Reading and writing the file is very easy (I am using pseudocode):

Procedure Read (MyFile : TFile);
Var
  reader : IMyFileReader;

begin
  versionInfo = MyFile.ReadVersionInfo();
  reader = ReaderFactory.CreateFromVersion(versionInfo);
  reader.Read(MyFile);
end;


Type
  ReaderFactory = Class
  public 
    class function CreateFromVersion(VersionInfo : TVersionInfo) : IMyFileReader;
  end;

function ReaderFactory.CreateFromVersion(VersionInfo : TVersionInfo) : IMyFileReader;
begin
  if VersionInfo = '0.9-Alpha' then
    result := TVersion_0_9_Alpha_Reader.Create()
  else if VersionInfo = '1.0' then
    result := TVersion1_0_Reader.Create()
  else ....
end;

This can easily be maintained and extended forever. You will never have to touch the Read-routine, but only add a new reader and enhance the factory. With a simple registration method and a TDictionary<TVersionInfo,TMyFileReaderClass>, you can even avoid having to modify the factory.

answered Oct 03 '13 at 08:28

alzaimar

4,572
1
16
30

This is identical to Pieter B's answer and has the same problem that old programs cannot read new files. – David Heffernan Oct 03 '13 at 08:33
1

If there is no way to distinguish unversioned from versioned files, than no old programs will be ever able to read versioned files. You either start adding either version info to your file or use a structured file format from the start or you're stuck. – alzaimar Oct 03 '13 at 08:36
Knowing that the file is a different format doesn't allow you to read it. Using a structured format (could be JSON, XML, or could be home brew binary format, or could be lots of other things) does allow that. Anyway, your comment agrees with mine. – David Heffernan Oct 03 '13 at 08:48
My solution was what seemed to me the simplest solution for solving the stated problem which was: reading older data with newer software. In practice I use tried and tested methods without trying to reinvent the wheel. I don't like homebrew formats to store data and try to avoid them as much as possible. (well sometimes you have to) – Pieter B Oct 03 '13 at 10:50

Effectively using Delphi to read unknown-sized blocks from a file

4 Answers4