What's the correct manner for saving data on disk if the database is down, for later retry?

Question

I need to make sure that data which comes into a service (java app) and needs to go into a database is not lost if the database is down. Is there a correct API I should be using from Java to do this? Should I just save a the data to files and reread them later?

@MarkRotteveel are there any best practices? I'm having trouble finding anything in google searches — Don Rhummy, Apr 18 '16 at 14:16
Can you send the data to a queue instead of sending it directly to your Java application? In that scenario there would be a common pattern for handling a database outage. — Mark B, Apr 18 '16 at 14:22
@MarkB But what if the system that has the queue goes down (electricity turned off, crashed, etc)? The queue would be gone. So there needs to be a way to save the data (probably to files, but in what format?) to be recoverable on system failure. — Don Rhummy, Apr 18 '16 at 14:31
There are ways to setup distributed highly available queues... Perhaps you need to look into highly available system architectures. — Mark B, Apr 18 '16 at 14:33
@MarkB Thank you! "distributed highly available queues" looks very promising. this might be exactly what we need — Don Rhummy, Apr 18 '16 at 14:41

Stephen C · Answer 1 · 2016-04-18T15:00:41.700

3

The best solution is to make sure that the database doesn't go down; e.g. run a hot standby. Typical database systems have ready-made support for this sort of thing. It is simply a matter of standing up more database replicas and configuring the clients to use them.

The problem with saving to a file is that you now have two sets of persistence code to maintain, and a bunch of complicated custom code to deal with detecting that the database is offline, switch to files, detecting that the database is back, and replaying the saved data into the database.

You also have to consider what happens if you lose the stuff saved to files; e.g. because of bugs / inadequate testing of your file-based persistence and recovery code.

The idea of using a queuing system with persistent queues is worth looking at. However, you need to be aware that the queuing system could go down too ... and that adds another potential failure point for your overall system. And besides, persistent queues require local disk space, and if that fills up (because the downstream database is down for a long time) you are back where you started.

edited Apr 18 '16 at 15:00

answered Apr 18 '16 at 14:25

Stephen C

698,415
94
811
1,216

That is *not* the best solution. There could be a million reasons the database is unreachable. Someone could accidentally change the firewall so the DB (and the hot standby) is unreachable. The network tables could be accidentally changed. Etc. There is *always* a reason things could go down or be unreachable. I'm looking for a plan in that case. – Don Rhummy Apr 18 '16 at 14:29
Whatever. If you have people making random changes to your network firewalls, then you need to solve >>that<< problem first. – Stephen C Apr 18 '16 at 14:32
@DonRhummy if those types of issues are happening in your production environment then you bigger problems that need to be addressed. – Mark B Apr 18 '16 at 14:33
@StephenC Those were simple examples, but I know issues like this have happened at hosting services where something got changed that made services unreachable for a lot of people/servers – Don Rhummy Apr 18 '16 at 14:39

What's the correct manner for saving data on disk if the database is down, for later retry?

1 Answers1