Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).
Questions tagged [fault-tolerance]
305 questions
3
votes
1 answer
Google+ Auth fault tolerance (code was already redeemed)
I'm currently implementing Google+ authentication on Android with offline access. This entails requesting a one-time authorization code that can be sent to the server and redeemed for a refresh token. So far so good.
However imagine that there is an…

Levi Botelho
- 24,626
- 5
- 61
- 96
3
votes
1 answer
Auto reconnect to RabbitMQ cluster after server restart
I have master-slave configuration of RabbitMQ. As two Docker containers, with dynamic internal IP (changed on every restart).
Clustering works fine on clean run, but if one of servers got restarted it cannot reconnect to the cluster:
rabbitmqctl…

Igor Artamonov
- 35,450
- 10
- 82
- 113
3
votes
2 answers
Software Fault Tolerance
Does anyone know how software fault tolerance is implemented in Air Traffic Control Systems?
Some URLs would be very helpful.

Upul Bandara
- 5,973
- 4
- 37
- 60
3
votes
2 answers
Akka OneForOneStrategy does not work
I have the following code:
class A extends Actor with ActorLogging {
override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 2) {
case _ => log.info("An actor has been killed"); Restart
}
val b =…

src091
- 2,807
- 7
- 44
- 74
3
votes
0 answers
Memcached Fault-Tolerance and failover property
As mentioned on this page: Memcached for PHP and failover,
I am trying to test the failover of Memcached.
Basically, I want to ensure that if one of the server is marked dead, subsequent sets and gets should get re-distributed to the servers that…

user2969781
- 51
- 1
- 4
3
votes
2 answers
OpenMPI custom fault tolerance for lowly coupled parallel processes
I do computations on the Amazon EC3 platform, using multiple machines which are connected through OpenMPI. To reduce the cost of the computation, spot instances are used, which are automatically shut down when the cost of a machine goes above a…

vkubicki
- 1,104
- 1
- 11
- 26
3
votes
1 answer
Ruby library for distributed computing?
I'm developing an algorithm for a realtime data analysis task in Ruby. The bottleneck is the CPU because of the quite large dataset. So to reach the needed performance, I have to use more cores in parallel, probably on different machines.
My…

der_flo
- 101
- 7
3
votes
3 answers
Does the child actor know that he is being resumed?
Imagine a straight-forward supervision hierarchy. The child dies. The father decides to Restart the child. When Restarted, the postRestart and friends are called, but what if the father had decided to resume the child? Does the child actor know that…

agilesteel
- 16,775
- 6
- 44
- 55
3
votes
1 answer
Implementing fault tolerance in distributed message queues
Suppose in the picture below that the middle message queue fails. Senders can still get messages sent by using other message queues.
But what happens if the message queue dies after receiving the message. How does the sender know if the message was…

user782220
- 10,677
- 21
- 72
- 135
3
votes
0 answers
In Akka 2.0, is it possible to have a clustered supervisor?
I know that proper clustering using Akka is the focus of Akka 2.1, however need to build something with what's available now.
I have a multi node Akka setup and want to gracefully handle remote actors dying. However, as the system is symmetric and…

SoftMemes
- 5,602
- 4
- 32
- 61
2
votes
4 answers
Robust fault tolerant MySQL replication
Is there any way to get a fault tolerant MySQL replication? I am in an environment that has many networking issues. It appears that replication gets an error and just stops. I need it to continue to work and recover from these faults. There is some…

Joshua
- 26,234
- 22
- 77
- 106
2
votes
2 answers
How last thrown Exception contains previously thrown Exception?
How does the last exception thrown contain all previously thrown exceptions in java?
I read Fail Safe Exception Handling from
Fail Safe Exception Handling and I am not able to get this point "One way to do so is to make sure that the last exception…

Govind Gupta
- 29
- 1
2
votes
3 answers
Fault-tolerant file_get_contents
I have a website with the following architecture:
End user ---> Server A (PHP) ---> Server B (ASP.NET & Database)
web file_get_contents
browser
Server A is a simple…

Heinzi
- 167,459
- 57
- 363
- 519
2
votes
1 answer
Killing Supervised process in Phoenix Framework causes the entire application to shutdown
I have a Phoenix application which creates the following supervision tree (taken from the erlang observer):
The restart strategy of the supervisor is :one_to_one. The expectation is that if I kill any of the Supervised processes, that individual…

otboss
- 621
- 1
- 7
- 16
2
votes
1 answer
Setting up nginx for fault tolerance
I have two backend server. I need to configure the nginx config so that when one server falls off, it switches to the second backend server
Unfortunately I found only about load distribution between several backend servers
I didn't work with nginx…

Kate
- 49
- 5