Best retry policy on data creation response timeout

Question

What is the best retry policy in such a scenario:

Database succeeds in creating the data entry, but then the response takes too long to reach Application. So to carry out the work, Application retries the creation, and of course Database returns an "already exists" error. So in the end from Application's perspective, it seems the creation has failed, while in fact it succeeded. And even worse, if this is in the middle of a series of steps, then there's no way for Application to decide whether to trigger a rollback on the previous steps.

Increasing the timeout length on Application is not a acceptable solution because the IP network can never be 100% reliable and there's always a tiny chance where the response could just get lost in the network.

Adding a check of existence of <data> before creating could work. But that's only when concurrency is taken into consideration. In my case there can be multiple clients to Database and I am not certain on the chance of race conditions.

+-------------+                                             +-----------+    
| Application |                                             | Database  |    
+-------------+                                             +-----------+    
       |                                                          |          
       | CREATE <data>                                            |          
       |--------------------------------------------------------->|          
       |                                                          |          
       |                                                          | creating 
       |                                                          |--------- 
       |                                                          |        | 
       |                                                          |<-------- 
       | -------------------------------\                         |          
       |-| timeout waiting for response |                         |          
       | |------------------------------|                         |          
       |                                                          |          
       |                                                  SUCCESS |          
       |<---------------------------------------------------------|          
       | -----------------------------------------------\         |          
       |-| response from a timed out session is ignored |         |          
       | |----------------------------------------------|         |          
       |                                                          |          
       | retry CREATE <data>                                      |          
       |--------------------------------------------------------->|          
       |                                                          |          
       |                             ERROR: <data> ALREADY EXISTS |          
       |<---------------------------------------------------------|          
       | ---------------------------------------------------\     |          
       |-| no idea whether the creation actually took place |     |          
       | |--------------------------------------------------|     |          
       |                                                          |

Doesn't your application get a "network connection timed out" error? — Neville Kuyt, Aug 07 '17 at 07:48

markusk · Answer 1 · 2017-08-07T07:43:59.880

Most modern databases provide some way of writing "upsert" statements that will atomically insert data if it doesn't exist, and update (or do nothing) if it already exists. That way, your application can safely retry and will not get an error if the data already was created, making your data creation idempotent.

Examples for some popular databases:

MySQL:

 -- Do nothing if data exists
 INSERT IGNORE ...
 -- Update if data exists
 INSERT ...  ON DUPLICATE KEY UPDATE ...

PostgreSQL:

-- Do nothing if data exists
INSERT ... ON CONFLICT DO NOTHING
-- Update if data exists
INSERT ... ON CONFLICT ... DO NOTHING

Oracle:

-- Do nothing if data exists
MERGE INTO ... USING ...
WHEN NOT MATCHED THEN INSERT ...
-- Update if data exists
MERGE INTO ... USING ...
WHEN NOT MATCHED THEN INSERT ...
WHEN MATCHED THEN UPDATE ...

If atomic operations or transactions aren't an option, you could write your database operations so that retries aren't harmful, and do each operation in a loop that first checks if the database is already in the desired state, then attempts the operation if it isn't, and retries on failure. In other words, something like (pseudo-code):

max_retries = n
retries = 0
WHILE NOT database_in_desired_state
    IF retries < max_retries THEN
        perform_database_operation
        retries = retries + 1
    ELSE
        fail

You can make retries harmless by making operations conditional (e.g. UPDATE some_table SET field = value, version = version + 1 WHERE version = expected_version or adding unique constraints and the like so that duplicate operations are disallowed. If you provide more detail on exactly what database you're using, I might be able to provide more concrete advice.

If you are performing a long series of operations on multiple remote systems, where the whole operation should be rolled back if a failure happens and there's no way to wrap all the interactions in a single (distributed) transaction, you will need to write compensating transactions that will manually roll back the work done so far on errors. Of course, the compensating transaction could also fail, and you need to consider how that should be handled. One way is to have periodic cleanup tasks that look for failed transactions or inconsistent states.

The database we use is LDAP-based. And most of the times one customer request must be fulfilled with several LDAP operations, mixed with some interactions with another module among the process. — Sah, Aug 04 '17 at 13:22
Please make the LDAP requirement explicit - most people think RDBMS when you say "database" so you will get a lot of bad answers. — Neville Kuyt, Aug 07 '17 at 07:47
I was trying to make the question more generic because the way I see it, it's not a problem that can be solved by the database but more of a choice of policy on the application's side. But I'll take your advice into consideration next time I raise a question. — Sah, Aug 07 '17 at 07:56

score 0 · Answer 2 · answered Aug 07 '17 at 08:13

It all depends on the context.

Yes, network connections can fail - but you have to decide how big of a risk this is. If you are using a professional hosting set-up, with enterprise grade equipment, this will happen - well, almost never. In this case, I wouldn't build a lot of additional logic into the application to handle network problems; you should rely on your database's transaction management features to ensure the data is in a consistent state. Once your application catches the network exception, you can then show an error to the user, and ask them to start again.

If your environment is inherently unreliable - you're connecting across the public internet, for instance - the common architecture pattern is to use a message bus, rather than synchronous operations.

Writing synchronous code to handle unreliable network conditions is not trivial; you would start with the pseudo code @markusk posted, but I would add to that closing and re-opening the database connection.

Best retry policy on data creation response timeout

2 Answers2