We set up an event grid and a logic app as a subscriber to the event grid. One of our core requirements is not loose the requests once the event grid receives the requests. Not loosing requests means this system must come back to the caller either successful or fail.
So what we have,
- Time-to-live in Event Grid: when the logic app is dead (does not pull the requests from Event Grid), the requests fallout to "Time-to-live" queue and notify the caller "fail"
- Logic App timeout: When some parts of the Logic app fails or loop, the timeout will occur, we notify the caller "fail"
- Logic app runs smoothly, then "Success". (over simplifying)
Now a question is what if the logic app crashes entirely? Because if the logic app crashes, then its timeout (2 above) will not be functional? So therefore we never be able to return to the caller?
Are there solutions we can do in the infrastructure level without building a complex mechanism?
For example, Logic app disaster recovery, set up two instances in different regions?
Or should we do something like
- should we create another timer that exists completely separate from the logic app? So the additional timer won't go down together with the logic app?
- Should we save the request statuses as the logic app progresses, then create another function app to look at the request statuses, and when the logic app comes back up, the function app pick the requests up based on the statues and push them again to the logic app?
Thank you kindly
Looked at MS logic app technical documentation