Normally I would just use SQL/SQLite/Mongo for anything database-y but I thought it would be interesting to try and create my own flat file database structure (it's just a learning project). The application itself is a music streamer with a centralised library on the server.
The database:
My application is a client/server one whereby any changes made by the server sync to all clients. Servers do insert, edit & delete operations.
Clients are only able to change a client modifier boolean field in a record (the value of which is specific to that client). No other operations are available to the clients, so therefore there are NO changes to sync.
Write operations on the server are rare after the initial database construction but do happen. The priority here is definitely read operations for the clients.
Needs to be scalable up to 500k+ tracks or 2GB (2^31 bytes) database file size (which ever comes first).
What is stored:
- Few tables, with some relations. It's only a mockup, but you get the idea:
+--------+ +--------+ +-------------------+ | id* | | id* | | id* | | ARTIST | ------> | ARTIST | | track name | | | | ALBUM | ------> | ALBUM | | | | year | | length | | | | | | filename** | | | | | | {client modifier} | +--------+ +--------+ +-------------------+ * unique identifier ** not stored in client version of database {client modifier} is only on the client version of database
One problem that would have to be overcome is how to deal with the relations and searching to minimise I/O operations.
- All fields are variable length apart from the id, year & length.
Required features:
- Sever able to sync the database to all clients with minimal operations.
One way to approach this would be store the date/time each record was last modified and have the client store the date of the last sync. When a client comes online, all changes part that date sync back to the client. Another way to do this would be to have a separate table on the server which lists all the operations that have happened and the date they happened; and sync in a similar fashion.
- Fast read operations for clients
Due to the tables being smaller, it is possible for a client to store the artists, albums tables in memory but I am going to assume they won't do this.
what I was thinking of doing is having separate files for each table and the client has each file open all the time to ensure they can read as quickly as possible...is this a bad idea?
Some sort of index will have to be stored for each table for where each record starts. This could be small enough to load into memory and could be stores in files separate to the actual tables to avoid issues.
- Minimise I/O operations
The server will store an "index" of the tracks database in memory with the id and file name so read operations are kept to a minimum.
The server will also buffer database write operations so that if it detects that a lot of write operations are going to happen in a short space of time it will wait and then do a batch write. This is possible because the changes to the file system will still be there is the database crashes, so it could just reload all changes on restart.
- NOT sparse file to keep file size to a minimum.
I will be working at a byte level to reduce the file size. The main problem will be fragmentation when a record is deleted. Because of the variable length fields, you can't simply add a new record in that place
I could de-fragment the file when it gets to a certain fragmentation level (ratio of deleted records to records), but I'd rather avoid this if I can as this will be an expensive operations for the clients.
I'd rather not use fixed length fields either (as the filename could be huge for instance), but these seem to me by only options?
Comments:
So how do I go about this and maximise performance?
Yes, I am looking a reinventing the wheel and yes I know I probably won't come anything close to the performance of other databases.