0

I Am designing a system where I will have programs running in Nominal/Redundant mode, One on one machine, one on another machine. Should the Nominal program fail (Failover event), the Redundant should take over and assume operations as a new Nominal process. This should be transparent to the user.

My Question is: when the Failover occurs, should this be only because of a Hardware failure ? or are Software errors enough of a cause to trigger a Failover ?

More generally, is there an industry standard for deciding what should cause a Failover, or is that up to the system architect / designer ?

NWS
  • 3,080
  • 1
  • 19
  • 34

1 Answers1

1

From the cluster point of view those kinds of errors do not make any difference. The thing is that you cannot rely on any "I am failing" events from a failing node.

Cluster (in your case "Redundant" role) just finds out that a node didn't send heartbeat (didn't respond to ping). Then "Redundant" makes itself "master" and starts processing incoming requests. That's all, I think.

mikalai
  • 1,746
  • 13
  • 23