What is the best retry policy in such a scenario:
Database
succeeds in creating the data entry, but then the response takes too long to reach Application
. So to carry out the work, Application
retries the creation, and of course Database
returns an "already exists" error. So in the end from Application
's perspective, it seems the creation has failed, while in fact it succeeded. And even worse, if this is in the middle of a series of steps, then there's no way for Application
to decide whether to trigger a rollback on the previous steps.
Increasing the timeout length on Application
is not a acceptable solution because the IP network can never be 100% reliable and there's always a tiny chance where the response could just get lost in the network.
Adding a check of existence of <data>
before creating could work. But that's only when concurrency is taken into consideration. In my case there can be multiple clients to Database
and I am not certain on the chance of race conditions.
+-------------+ +-----------+
| Application | | Database |
+-------------+ +-----------+
| |
| CREATE <data> |
|--------------------------------------------------------->|
| |
| | creating
| |---------
| | |
| |<--------
| -------------------------------\ |
|-| timeout waiting for response | |
| |------------------------------| |
| |
| SUCCESS |
|<---------------------------------------------------------|
| -----------------------------------------------\ |
|-| response from a timed out session is ignored | |
| |----------------------------------------------| |
| |
| retry CREATE <data> |
|--------------------------------------------------------->|
| |
| ERROR: <data> ALREADY EXISTS |
|<---------------------------------------------------------|
| ---------------------------------------------------\ |
|-| no idea whether the creation actually took place | |
| |--------------------------------------------------| |
| |