I want to implement a fast database alternative that only needs to handle binary data. To specify, I want something close to a database that will be securely stored even in case of a forced termination (task manager) during execution, whilst also being accessed directly from memory in C++. Like a vector of structs that is mirrored to the hard disk. It should be able to handle hundreds of thousands of read accesses and at least 1000 write accesses per second. In case of a forced termination, at most the last command can be lost. It does not need to support multithreading and the database file will only be accessed by a single instance of the program. Only needs to run on Windows. These are the solutions I've thought of so far:
SQL Databases
- Advantages
- Easy to implement, since lots of libraries are available
- Disadvantages
- Server is on a different process, therefor possibly slow inter process communication
- Necessity of parsing SQL queries
- Built for multithreaded environments, so lots of unnecessary synchronization
- Rows can't be directly accessed using pointers but need to be copied at least twice per change
- Unnecessary delays on the UPDATE query, since the whole table needs to be searched and the WHERE case checked
- These were just a few from the top of my head, there might be a lot more
- Advantages
Memory Mapped Files
- Advantages
- Direct memory mapping, so direct pointer access possible
- Very fast compared to databases
- Disadvantages
- Forceful termination could lead to a whole page not being written
- Lots of code (I don't actually mind that)
- No forced synchronization possible
- Increasing file size might take a lot of time
- Advantages
- C++ vector*
- Advantages
- Direct pointer access possible, however, needs to manually notify of changes
- Very fast compared to databases
- Total programming freedom
- Disadvantages
- Possibly slow because of many calls to WriteFile
- Lots of code (I don't actually mind that)
- Advantages
- C++ vector with complete write every few seconds
- Advantages
- Direct pointer access possible
- Very fast compared to databases
- Total programming freedom
- Disadvantages
- Lots of unchanged data being rewritten to file, alternatively lots of RAM wasted on preventing unnecessary writes
- Inaccessibility during writes of lots of RAM wasted on copy
- Could lose multiple seconds worth of data
- Multiple threads and therefor synchronization needed
- Advantages
*Basically, a wrapper class that only exposes per row read/write functionality of a vector OR allows direct write to memory, but relies on the caller to notify of changes, all reads are done from a copy in memory, all writes are done to a copy in memory and the file itself on a per-command basis
Also, is it possible to write to different parts of a file without flushing, and then flushing all changes at once with a guarantee that the file will be written either completely or not at all even in case of a forced termination during write? All I can think of is the following workflow:
Duplicate target file on startup, then for every set of data: Write all changes to duplicate -> Flush by replacing original with duplicate
However, I feel like this would be a horrible waste of hard disk space for big files.
Thanks in advance for any input!