2

I want to implement a fast database alternative that only needs to handle binary data. To specify, I want something close to a database that will be securely stored even in case of a forced termination (task manager) during execution, whilst also being accessed directly from memory in C++. Like a vector of structs that is mirrored to the hard disk. It should be able to handle hundreds of thousands of read accesses and at least 1000 write accesses per second. In case of a forced termination, at most the last command can be lost. It does not need to support multithreading and the database file will only be accessed by a single instance of the program. Only needs to run on Windows. These are the solutions I've thought of so far:

  1. SQL Databases

    • Advantages
      • Easy to implement, since lots of libraries are available
    • Disadvantages
      • Server is on a different process, therefor possibly slow inter process communication
      • Necessity of parsing SQL queries
      • Built for multithreaded environments, so lots of unnecessary synchronization
      • Rows can't be directly accessed using pointers but need to be copied at least twice per change
      • Unnecessary delays on the UPDATE query, since the whole table needs to be searched and the WHERE case checked
      • These were just a few from the top of my head, there might be a lot more
  2. Memory Mapped Files

    • Advantages
      • Direct memory mapping, so direct pointer access possible
      • Very fast compared to databases
    • Disadvantages
      • Forceful termination could lead to a whole page not being written
      • Lots of code (I don't actually mind that)
      • No forced synchronization possible
      • Increasing file size might take a lot of time
  3. C++ vector*
    • Advantages
      • Direct pointer access possible, however, needs to manually notify of changes
      • Very fast compared to databases
      • Total programming freedom
    • Disadvantages
      • Possibly slow because of many calls to WriteFile
      • Lots of code (I don't actually mind that)
  4. C++ vector with complete write every few seconds
    • Advantages
      • Direct pointer access possible
      • Very fast compared to databases
      • Total programming freedom
    • Disadvantages
      • Lots of unchanged data being rewritten to file, alternatively lots of RAM wasted on preventing unnecessary writes
      • Inaccessibility during writes of lots of RAM wasted on copy
      • Could lose multiple seconds worth of data
      • Multiple threads and therefor synchronization needed

*Basically, a wrapper class that only exposes per row read/write functionality of a vector OR allows direct write to memory, but relies on the caller to notify of changes, all reads are done from a copy in memory, all writes are done to a copy in memory and the file itself on a per-command basis

Also, is it possible to write to different parts of a file without flushing, and then flushing all changes at once with a guarantee that the file will be written either completely or not at all even in case of a forced termination during write? All I can think of is the following workflow:

Duplicate target file on startup, then for every set of data: Write all changes to duplicate -> Flush by replacing original with duplicate

However, I feel like this would be a horrible waste of hard disk space for big files.

Thanks in advance for any input!

Jan Weber
  • 137
  • 13
  • 2
    Regarding the behavior that occurs on an unexpected power outage, you're on very thin ice unless you are willing to buy specialized hardware. Even if your program flushes the data to the filesystem, the filesystem may or may not actually flush to the disk, and even if the filesystem flushes to the disk, the disk may or may not flush its internal buffers to its persistent storage medium. You might be better off investing in a UPS than trying to find a way to make guarantees about what will or won't happen in in the face of sudden power loss :) – Jeremy Friesner Jul 21 '16 at 04:54
  • I guess that's true, I was more concerned about termination by task manager or a crash, actually. I edited the question. – Jan Weber Jul 21 '16 at 05:02
  • 1
    Sounds to me like a memory-mapped file (or files) would be a good way to go (plus removing all bugs from the program so that it doesn't crash ;)) This article might be of interest if you haven't seen it already: https://blogs.msdn.microsoft.com/oldnewthing/20100428-00/?p=14223 – Jeremy Friesner Jul 21 '16 at 06:18

0 Answers0