Aerospike is a key store database with support for persistence. But can I trust this persistence enough to use it as an database altogether? As I understand it writes data to memory first and then persist it. I can live with eventual consistency, but I don't want to be in a state where something was committed but due to machine failure it never got written down to the disk and hence can never be retrieved. I tried looking at the various use cases but I was just curious about this one. Also what guarantee does client.put provides as far as saving of a new record is concerned.
1 Answers
Aerospike provides a user configurable replication factor. Most people use 2, if you are really concerned, you can use 3 or even more. Size the cluster accordingly. For RF=3, put returns when 3 nodes have written data to the their write-block in memory which is asynchronously flushed to persistent layer. So it depends on what node failure pattern you are trying protect against. If you are worried about entire cluster crashing instantly, then you may have a case for 1 second (default) worth of lost data. The one second can be configured lower as well. Aerospike also provides rack aware configuration which protects against data loss if entire rack goes down. The put goes to nodes in different racks always. Finally Aerospike provides cross data center replication - its asynchronous but does give an option to replicate your data across geo. Of course, going across geo does have its latency. Finally, if you are totally concerned about entire cluster shutdown, you can connect to two separate clusters in your application and always push updates to two separate clusters. Of course, you must now worry about consistency if application fails between two writes. I don't know of anyone who had to resort to that.

- 5,130
- 11
- 8
-
Thanks for the answer. Even if it ensures that my data is written to two nodes I can live with that. I could not get "If you are worried about entire cluster crashing instantly, then you may have a case for 1 second (default) worth of lost data" Which configuration is this? – Anunay Aug 19 '17 at 18:39
-
1Look at http://www.aerospike.com/docs/reference/configuration and search for flush-max-ms parameter. It is 1000 ms by default and the recommended value. You must understand your incoming data rate, how defrag works etc before mucking with this parameter though! – pgupta Aug 19 '17 at 20:25