Questions tagged [fault-tolerance]

Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).

305 questions
3
votes
1 answer

Google+ Auth fault tolerance (code was already redeemed)

I'm currently implementing Google+ authentication on Android with offline access. This entails requesting a one-time authorization code that can be sent to the server and redeemed for a refresh token. So far so good. However imagine that there is an…
Levi Botelho
  • 24,626
  • 5
  • 61
  • 96
3
votes
1 answer

Auto reconnect to RabbitMQ cluster after server restart

I have master-slave configuration of RabbitMQ. As two Docker containers, with dynamic internal IP (changed on every restart). Clustering works fine on clean run, but if one of servers got restarted it cannot reconnect to the cluster: rabbitmqctl…
Igor Artamonov
  • 35,450
  • 10
  • 82
  • 113
3
votes
2 answers

Software Fault Tolerance

Does anyone know how software fault tolerance is implemented in Air Traffic Control Systems? Some URLs would be very helpful.
Upul Bandara
  • 5,973
  • 4
  • 37
  • 60
3
votes
2 answers

Akka OneForOneStrategy does not work

I have the following code: class A extends Actor with ActorLogging { override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 2) { case _ => log.info("An actor has been killed"); Restart } val b =…
src091
  • 2,807
  • 7
  • 44
  • 74
3
votes
0 answers

Memcached Fault-Tolerance and failover property

As mentioned on this page: Memcached for PHP and failover, I am trying to test the failover of Memcached. Basically, I want to ensure that if one of the server is marked dead, subsequent sets and gets should get re-distributed to the servers that…
user2969781
  • 51
  • 1
  • 4
3
votes
2 answers

OpenMPI custom fault tolerance for lowly coupled parallel processes

I do computations on the Amazon EC3 platform, using multiple machines which are connected through OpenMPI. To reduce the cost of the computation, spot instances are used, which are automatically shut down when the cost of a machine goes above a…
vkubicki
  • 1,104
  • 1
  • 11
  • 26
3
votes
1 answer

Ruby library for distributed computing?

I'm developing an algorithm for a realtime data analysis task in Ruby. The bottleneck is the CPU because of the quite large dataset. So to reach the needed performance, I have to use more cores in parallel, probably on different machines. My…
der_flo
  • 101
  • 7
3
votes
3 answers

Does the child actor know that he is being resumed?

Imagine a straight-forward supervision hierarchy. The child dies. The father decides to Restart the child. When Restarted, the postRestart and friends are called, but what if the father had decided to resume the child? Does the child actor know that…
agilesteel
  • 16,775
  • 6
  • 44
  • 55
3
votes
1 answer

Implementing fault tolerance in distributed message queues

Suppose in the picture below that the middle message queue fails. Senders can still get messages sent by using other message queues. But what happens if the message queue dies after receiving the message. How does the sender know if the message was…
user782220
  • 10,677
  • 21
  • 72
  • 135
3
votes
0 answers

In Akka 2.0, is it possible to have a clustered supervisor?

I know that proper clustering using Akka is the focus of Akka 2.1, however need to build something with what's available now. I have a multi node Akka setup and want to gracefully handle remote actors dying. However, as the system is symmetric and…
SoftMemes
  • 5,602
  • 4
  • 32
  • 61
2
votes
4 answers

Robust fault tolerant MySQL replication

Is there any way to get a fault tolerant MySQL replication? I am in an environment that has many networking issues. It appears that replication gets an error and just stops. I need it to continue to work and recover from these faults. There is some…
Joshua
  • 26,234
  • 22
  • 77
  • 106
2
votes
2 answers

How last thrown Exception contains previously thrown Exception?

How does the last exception thrown contain all previously thrown exceptions in java? I read Fail Safe Exception Handling from Fail Safe Exception Handling and I am not able to get this point "One way to do so is to make sure that the last exception…
2
votes
3 answers

Fault-tolerant file_get_contents

I have a website with the following architecture: End user ---> Server A (PHP) ---> Server B (ASP.NET & Database) web file_get_contents browser Server A is a simple…
Heinzi
  • 167,459
  • 57
  • 363
  • 519
2
votes
1 answer

Killing Supervised process in Phoenix Framework causes the entire application to shutdown

I have a Phoenix application which creates the following supervision tree (taken from the erlang observer): The restart strategy of the supervisor is :one_to_one. The expectation is that if I kill any of the Supervised processes, that individual…
2
votes
1 answer

Setting up nginx for fault tolerance

I have two backend server. I need to configure the nginx config so that when one server falls off, it switches to the second backend server Unfortunately I found only about load distribution between several backend servers I didn't work with nginx…
Kate
  • 49
  • 5