Should i engineer my microservice to handle involuntary disruptions like hardware failure?
Are these disruptions frequent enough to be handled in a service running on AWS managed EKS cluster.
Should i consider some design change in the service to handle the unexpected SIGKILL with methods like persisting the data at each step or will that be considered as over-engineering?
What standard way would you suggest for handling these involuntary disruptions if it is
a) a restful service that responds typically in 1s(follows saga pattern).
b) a service that process a big 1GB file in 1 hour.