1

In my application I want to use files for storing data. I don't want to use database or clear text file, the goal is to save double and integer values along with string just to identify the name of the record ; I simple need to save data on disk for generating reports. File can grow even to gigabyte. What format you suggest to use? Binary? If so what vcl component/library you know which is good to use? My goal is to create an application which creates and updates the files while another tool will "eat" those file producing nice pdf reports for user on demand. What do you think? Any idea or suggestion?

Thanks in advance.

Matthias Alleweldt
  • 2,453
  • 17
  • 16
marcostT
  • 573
  • 1
  • 10
  • 20
  • 2
    Sounds like a great opportunity to design your own binary file format, using nothing more than standard Pascal I/O. But I don't dare posting such outrageous ideas as an answer... – Andreas Rejbrand Apr 28 '11 at 14:38
  • pascal io has limited file size – David Heffernan Apr 28 '11 at 14:55
  • Indexing is going to be a consideration. You aren't going to want to do a brute-force search everytime. If you do, you're going to be averaging 500mb per read operation! If your data is alphabetized, you could probably use a binary search to find it more quickly. But if it's scattered in random order, such as if you are just appending data to the end of a log, you need a way to find out where to start looking. – Chris Thornton Apr 28 '11 at 15:22
  • I was reading this: http://www.powerbasic.com/support/help/pbcc/random_access_files.htm, I guess Random Access file. – marcostT Apr 28 '11 at 15:30
  • David: only typed file is limited afaik. – Marco van de Voort Apr 28 '11 at 15:41
  • 1
    Please note that the Pascal IO functions (AssignFile, Append, Rewrite, Read(Ln), Write(Ln), etc) do ***NOT*** support Unicode. In any Delphi version, Unicode or not. – Marjan Venema Apr 28 '11 at 17:32
  • About pascal typed file: this should not be used any more in any serious project, and fixed-sized records is not a good solution about performance (you have to maintain your indexes by hand) or size storage (if you want to store some text within - which will be a shortstring, so never Unicode). – Arnaud Bouchez Apr 29 '11 at 05:59
  • The user previously mentioned "random access files" as in PowerBasic, but seems to have deleted that from the question. – Warren P Apr 29 '11 at 14:03

5 Answers5

4

If you don't want to reinvent the wheel, you may find all needed Open Source tools for your task from our side:

  • Synopse Big Table to store huge amount of data - see in particular the TSynBigTableRecord class to store an unlimited number of records with fields, including indexes if needed - it will definitively be faster and use less disk size than any other regular SQL DB
  • Synopse SQLite3 Framework if you would rather use a standard SQLite engine for the storage - it comes with a full Client/Server ORM
  • Reporting from code, including pdf file generation

With full Source code, working from Delphi 6 up to XE.

I've just updated the documentation of the framework. More than 600 pages, with details of every class method, and new enhanced general introduction. See the SAD document.

Update: If you plan to use SQLite, you should first guess how the data will be stored, which indexes are to be created, and how a SQL query may speed up your requests. It's a bad idea to read all file content for every request: you should better structure your data so that a single SQL query would be able to return the expended results. Sometimes, using additional values (like temporary sums or means) to the data is a good idea. Also consider using the RTree virtual table of SQLite3, which is dedicated to speed up access to double min/max multi-dimensional data: it may speed up a lot your requests.

Community
  • 1
  • 1
Arnaud Bouchez
  • 42,305
  • 3
  • 71
  • 159
  • About SQLite Hi. Ok I have reviwed the SQLite documentation and I got a question. I plan to have at least two process reading the database and just one for writing. 1. You suggest "to trust" of SQLite locks or should I implement my own mutex? 2. I use delphi 2009. What about AnyDAC for SQLite? It makes sense to use it. I guess is not mandatory. Nobody used it? 3. SQLite3 Framework and SQLite. Someone else can compare them? Have used SQLite3 Framework before? – marcostT Apr 29 '11 at 07:55
  • @marcosT 1. SQLite is truly ACID and will lock file as expected - you can be confident in this 2. AnyDAC is good, but much more than a SQLite wrapper 3. The SQLite3 framework is an ORM based over SQLite but you have a wrapper to access directly the SQlite engine without ORM - it also add a JSON based Client-Server architecture to SQLite – Arnaud Bouchez Apr 29 '11 at 08:03
2

You don't want to use a full SQL database, and you think that a plain text file is too simple.

Points in between those include:

  1. Something that isn't a full SQL database, but more of a key-value store, would technically not be a flat file, but it does provide a single "key+value" list, that is quickly searchable on a single primary key. Such as BSDDB. It has the letter D and B in the name. Does that make it a database, in your view? Because it's not a relational database, and doesn't do SQL. It's just a binary key-value (hashtable) blob storage mechanism, using a well-understood binary file format. Personally, I wouldn't start a new project and use anything in this category.

  2. Recommended: Something that uses SQL but isn't as large as standalone SQL database servers. For example, you could use SQLite and a delphi wrapper. It is well tested, and used in lots of C/C++ and Delphi applications, and can be trusted more than anything you could roll yourself. It is a very light embedded database, and is trusted by many.

  3. Roll your own ISAM, or VLIR, which will eventually morph over time into your own in-house DBMS. There are multiple files involved, and there are indexes, so you can look up data fast without loading everything into memory. Not recommended.

  4. The most flat of flat binary fixed-record-length files. You mentioned originally in your question, power basic which has something called Random Access files, and then you deleted that from your question. Probably what you are looking for, especially for append-only write as the primary operation. Roll your own TurboPascal era "file of record". If you use the "FILE OF RECORD" type, you hit the 2gb limit, and there are problems with Unicode. So use TStream instead, like this. Binary file formats have a lot of strikes against them, especially since it is difficult to grow and expand your binary file format over time, without breaking your ability to read old files. This is a key reason why I would recommend you start out with what might at first seem like overkill (SQLite) instead of rolling your own binary solution.

    (Update 2: After updating the question to mention PDFs and what sounds like a reporting-system requirement, I think you really should be using a real database but perhaps a small and easy to use one, like firebird, or interbase.)

Community
  • 1
  • 1
Warren P
  • 65,725
  • 40
  • 181
  • 316
  • (4) is vastly exaggerated. A RDBMS is a production data manipulation, not an export format. You really want to see it as a database, not as a storage format. But there is a difference. – Marco van de Voort Apr 28 '11 at 15:40
  • I rewrote point 4 after the OP gave some more detail on what they want to do. – Warren P Apr 28 '11 at 20:21
  • About point 2 - "Something that uses SQL but isn't a full SQL database, like SQLite" is not correct. It's a full SQL database, but not a Client-Server SQL database by default - but you can use it in a Client-Server application with no problem, thanks to another communication layer like DataSnap or such. – Arnaud Bouchez Apr 29 '11 at 06:02
  • About Update 2 - I don't understand why a real database is mandatory for reporting - you can perfectly create reports from pure Delphi code, with no SQL at all, with most report engines. Even SQlite can be used as SQL source if you want to use such classic band-oriented report engines. – Arnaud Bouchez Apr 29 '11 at 06:16
  • A.Bouchez; What "full SQL database" means to a person is very subjective. Some people think SQLite is a Embedded SQL Engine, not sufficiently standalone to be called a "full SQL database product", by which they mean one of the big boys (MS SQL Server, Oracle etc). This language is subjective and vague, and I don't really think I can fix it in a way that will make me understood identically by everyone. Thus "not a full SQL database" is aimed at the contact-language that is used by the OP, not an attempt to be scholarly. – Warren P Apr 29 '11 at 14:02
  • @Warren I just reacted to this sentence because the SQlite creator D. Richard Hipp would have also reacted to it! The SQL92 coverage of the engine is [almost complete](http://sqlite.org/omitted.html), and with some nice available enhancements (FTS, RTree, dynamic typing). – Arnaud Bouchez Apr 29 '11 at 14:20
  • SQL92 grammar coverage isn't the only factor people might be thinking about. – Warren P Apr 29 '11 at 16:00
  • How do you deal with versioning (backwards compat only), reading datafiles created by older versions in the sqlite case? – Marco van de Voort Apr 29 '11 at 16:59
  • SQlite file format is very stable. File format changed only by major version (SQLite 2 then SQLite 3). – Arnaud Bouchez Apr 29 '11 at 17:09
1

I would suggest using TClientDataSet, and use it's SaveToFile() / SaveToStream() methods by the generating program, and LoadFromFile() / LoadFromStream() methods for the program that will "consume" the data. That way, you can still make indexed records without connecting to any external database, all while keeping the interchange data in a single file.

Cesar Marrero
  • 338
  • 5
  • 15
  • 1
    Also, you're going to use over a gigabyte of memory when the data set gets to a gigabyte, as the OP said. A pure in-memory dataset solution (whether CSV or ClientDataSet) is only okay, if you're okay with doing a 1 gigabyte read every time you start, and a 1 gigabyte write every time you safe. Is that okay? Your call. – Warren P Apr 28 '11 at 17:32
  • @Warren - True, this approach would impose memory limitation, since the entire file is loaded all at once, but the OP mentioned that the file _could_ reach a gigabyte, and I didn't see anything about the target machine having limited memory. I suggested CDS as a possible way to provide database-like functionality _without_ the need for an external database. However, if memory IS a determining factor, then the OP could also consider breaking the data into smaller chunks ... otherwise, CDS would be the simplest **non-database** solution (outside of writing an entire storage VCL from scratch). – Cesar Marrero Apr 28 '11 at 20:18
  • 1
    What is the performance of `TClientDataSet` with huge amount of small records? It was never meant nor designed to handle such a huge database of small records. AFAIK it makes use of the "Array Manipulation Functions" API for disk access, which was never designed for this purpose, but for data marshaling in OLE Automation. I would never recommend this approach. – Arnaud Bouchez Apr 29 '11 at 06:08
  • In memory based solutions tend to be VERY FAST, at many things including searching, but not only do they use a lot of memory, many can be VERY SLOW at saving the modified dataset back to disk. 1+ gb saved all at the same time to your disk can take a while. – Warren P Apr 29 '11 at 17:15
  • @Warren and @A.Bouchez provide very valid arguments. And +1 to A.Bouchez for mentioning _Synopse Big Table_ ... I had totally forgotten about this remarkable open-source **non-database** solution. – Cesar Marrero May 02 '11 at 13:34
1
  1. Define API to work with your flat file, so that the API can be implemented by a separate data layer in many ways.
  2. Implement the API using standard embedded SQL database (ex SQLite or Firebird).
  3. Only if there is something wrong with the standard solution think of your own.
kludg
  • 27,213
  • 5
  • 67
  • 118
  • I will add data always at the end of file. This is a trend data type, let say one record: temperature, 20, 12/12/2011:13:44 ; Every 5 minutes i will add a new record. Then I would need a great vcl components which can extract data between two dates and plot a graph on a nice pdf report. – marcostT Apr 28 '11 at 16:42
  • The point is - the file structure does not matter much; you can use standard embedded SQL database (I recommend it) or design your own binary format and code to work with it (which probably does not make sense since standard solution is sufficient) – kludg Apr 28 '11 at 17:03
  • In the case of a circular buffer file, with three fields (timestamp+temperature, say, and some integer flags), I would use a simple binary file format, with a header record that shows the version of the file. – Warren P Apr 28 '11 at 20:16
0

I use KBMMemtable - see http://www.components4developers.com/ - fast, reliable, been around a long time - supports binary and CSV streaming in and out of files, as well indexing, filters, and lots of other goodies - TClientDataSet will not do well with large datasets.

Vector
  • 10,879
  • 12
  • 61
  • 101