I Am designing a system where I will have programs running in Nominal/Redundant mode, One on one machine, one on another machine. Should the Nominal program fail (Failover event), the Redundant should take over and assume operations as a new Nominal process. This should be transparent to the user.
My Question is: when the Failover occurs, should this be only because of a Hardware failure ? or are Software errors enough of a cause to trigger a Failover ?
More generally, is there an industry standard for deciding what should cause a Failover, or is that up to the system architect / designer ?