In a message passing/distributed systems, we do checkpoints based on synchronized clock, where we store the state of the process.
Now i want to know, how can we do this practically?
Say, my system deals with request /response client server system. In a place, i would like to make checkpoints, so i can do rollback, if there are any failures occurred.
In such case, what are the information i need to store? I would like to know the practical considerations. I went through several articles about the roll back recovery and now trying to make an implementation for a PoC.
Anybody, who tried the checkpoint mechanism in their system, could give me some clues?
Edit
Im trying to do a Rollback for non-deterministic events(eg: receiving requests to a webservice) There are two approaches i have in my mind, One is checkpoint based, another one is log based. I chose Apache Axis2 platform as my webservice platform. It already has the logging facility.So, logging will be easier in this case..
So, when we do checkpoint based/log based, do we need to store whole data?
Is there any difference in storing data in both cases?
In this type of recovery , we need to rollback the client and server, Client can be rollback based on the information we stored.. How can we rollback the server in that case? Is that necessary? Or i understood the protocol incorrectly