Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).
Questions tagged [fault-tolerance]
305 questions
1
vote
1 answer
Hystrix: HystrixBadRequestException for failed validations
I am trying to understand how Hystrix works with non-fault errors and the HystrixBadRequestException, particularly in the area of validation. I use JSR-303 bean validation (Hibernate validator) for all my beans:
public class User {
@Min(1L)
…

IAmYourFaja
- 55,468
- 181
- 466
- 756
1
vote
3 answers
What alternatives do I have if I want a distributed multi-master database?
I will build a system where I want to reduce single-point-of-failures, and I need a database. Is there any (free) relational database systems that can handle multi-master setups good (i.e where it is easy to add and remove nodes) or is it better to…

Jonas
- 121,568
- 97
- 310
- 388
1
vote
1 answer
Is MongoDB v2.6's WriteConcern "Broken"?
Edit - Possible Duplicate: To what extent are 'lost data' criticisms still valid of MongoDB? - If I had just punched something into Google differently, I more-or-less would've had this question answered. Sorry for the semi-dupe everyone.
I hate…

KSwift87
- 1,843
- 5
- 23
- 40
1
vote
1 answer
Single fault tolerant machine with amazon AWS
For a particular service, I need to run a single EC2 instance in a fault tolerant way.
Only in case of errors I want that the "primary" machine is terminated and the traffic must be be redirected on "secondary" machine within some seconds and…

allergique
- 73
- 5
1
vote
1 answer
How to implement fault tolerance of REST server in client side?
I am working on a system that have a RESTful web service that can manipulate it (the service allows all CURD operations), and a web client that displays the system's data(most of the client was written in jQuery). In the standard operation scenario…

user2579277
- 111
- 5
1
vote
1 answer
How to make a fault tolerant system which can immediately handle the situation when a server goes down
Before XYZ.com was down I noticed that my request was being routed to IP address 192.33.31.xxx and when it came up I noticed that my request was routed to IP 50.17.196.xxx , is it some sort of server switching? Isn't Dynamic server switching in case…

nirprat
- 435
- 1
- 5
- 17
1
vote
1 answer
Storm fault tolerance: Nimbus reassigns worker to a different machine?
How do I make storm-nimbus to restart worker on the same machine?
To test the fault tolerance, I do a kill -9 on a worker process expecting the worker to be restarted on the same machine, but on one of the machines, nimbus launches the worker on…

Behzad Pirvali
- 764
- 3
- 10
- 28
1
vote
2 answers
if a node in the host file goes down how to work with the remaining nodes of the cluster in MPI program
if a node in the host file goes down how to work with the remaining nodes using MPI

user2254219
- 19
- 1
1
vote
2 answers
Fault Tolerance based Approaches to avoid java.lang.OutOfMemoryError
Many a carefully crafted piece of Java code has been laid to waste by java.lang.OutOfMemoryError. There seems to be no relief from it, even production class code gets downed by it.
The question I wish to ask is: are there good…

user1172468
- 5,306
- 6
- 35
- 62
1
vote
0 answers
In Akka 2.0, can I restart an actor on a new remote node from a supervisor?
Assuming a supervisor that is supervising a remote actor. If the remote actor dies because its entire Akka node has been terminated, is it possible to resurrect the actor on a new Akka node, keeping all existing ActorRefs to it alive?

SoftMemes
- 5,602
- 4
- 32
- 61
0
votes
1 answer
How to achieve fault tolerance in cloud?
I am working on a project which aims at achieving fault tolerant cloud through elastic IP addressing and load balancing. Initially, I opted for Windows Azure but it provides automatic fault handling through its portal and user cannot control the…

rohan
- 1
- 1
0
votes
1 answer
how HDFS replication factor is decide on?
The replication factor in HDFS must be at least 3. Despite the fact that, the main purpose of choosing it to be 3 is fault-tolerance and the possibility of a rack failure is far less than the possibility of a node failure, is there another reason…

user1052958
- 20
- 9
0
votes
1 answer
why some Fault Tolerance settings in Azure Data Factory are disabled?
I am creating an Azure Data factory to copy binary files from the Google Cloud Storage bucket to the Azure Blob Storage container. The files need to be copied without any compression.
I want to specify fault tolerance settings to skip files with…

Shalaka Deshpande
- 156
- 1
- 9
0
votes
0 answers
SmallRye Fault Tolerance: @Retry - Get the retry attempts before it success
Is there any way to get how many retries the @Retry did before the method succeeded?
For example you have maxRetries = 3, then it was succeeded on 2nd retry, I want to get that value and save it in my database. Like this:
if(retry == success)…

Eve
- 43
- 1
- 10
0
votes
0 answers
How to handling completion of multiple asynchronous messages and ensuring exactly-once semantic in Flink Statefun
I'm trying to add more functionality to the example of flink-statefun-playground/java/shopping-cart and have two questions:
1.How to implement the functionality of waiting for completion of multiple asynchronous messages.
For example, if…

zack
- 1
- 1