Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).
Questions tagged [fault-tolerance]
305 questions
0
votes
1 answer
Returning NACKed requests in RabbitMQ work queues
I'm trying to implement a work queue architecture using RabbitMQ. I have a single sender application and multiple consumers.
I use manual ack on the consumers, so in case of failure in handling a request, it will be re-queued for another consumer to…

mich8bsp
- 325
- 1
- 3
- 9
0
votes
1 answer
How to automatically restart a failed node in Spark Streaming?
I'm using Spark on a cluster in Standalone mode.
I'm currently working on a Spark Streaming application. I've added checkpoints for the system in order to deal with the master process suddenly failing and I see that it's working well.
My question…

Gideon
- 2,211
- 5
- 29
- 47
0
votes
1 answer
Need to setup Azure WebApp with Loadbalancer and fault tollerance
I have a website which is created using ASP.Net, C#, Azure SQL and it is hosted on Azure Webapp.
I have a requirement where I need to setup Loadbalancer for website with fault tolerance.
I have set up a Traffice manager where there are two replicas…

Naresh Ravlani
- 1,600
- 13
- 28
0
votes
2 answers
Exception handling in a real time, SQL-Server driven system
I have developed a report viewer in .NET Winforms (it just runs queries and displays results).
This works against a reporting database. However, the above is a small subset of a much larger application, which gets data from another database. It…

GurdeepS
- 65,107
- 109
- 251
- 387
0
votes
0 answers
How to add fault tolerance support to an existing MPI based system such that the system continues even after a machine goes down?
I am trying to modify an MPI based system to add fault tolerance (process should continue if machines go down).
I was thinking of using Apache Zookeeper to handle the machine failure case. Is it the best way to proceed further? Also, what happens…

JhnElaine
- 93
- 1
- 4
0
votes
1 answer
Use case for Akka PoisonPill
According to the Akka docs for PoisonPill:
You can also send an actor the akka.actor.PoisonPill message, which will stop the actor when the message is processed. PoisonPill is enqueued as ordinary messages and will be handled after messages that…

smeeb
- 27,777
- 57
- 250
- 447
0
votes
1 answer
Groovy Closures for Failover
I have the following class:
class WidgetClient {
List getAllWidgets() {
_actuallyGetAllWidgets()
}
void saveWidget(Widget w) {
_actuallySaveWidget(w)
}
void deleteWidget(Widget w) {
…

smeeb
- 27,777
- 57
- 250
- 447
0
votes
1 answer
Error while configuring EMS with Database in Fault Tolerant mode
I am trying to setup my EMS in FT Mode, I have configured all the parameters in the 2 EMS config files.
But Im getting the warning:
Unable to initialize fault tolerant connection, remote server returned 'invalid user name'
Servername and password…

Hakan Kiyar
- 1,199
- 6
- 16
- 26
0
votes
1 answer
How can I implement persistent/fault-tolerant replication using PouchDB?
PouchDB's replicate() functions are not fault-tolerant and will stop replicating if you loose your internet connection or encounter some network disruptions. This is quite frustrating when you need your app to replicate data whenever an internet…

redgeoff
- 3,163
- 1
- 25
- 39
0
votes
1 answer
Why does one need to write fault tolerant applications when building on cloud infrastructure?
I got this interview question today 'Why do you need to write fault tolerant applications when building on cloud infrastructure?'
I answered: They are hard to debug and hard to fix, so they better be very well tested and robust. Data in database can…

Matas Vaitkevicius
- 58,075
- 31
- 238
- 265
0
votes
2 answers
How to handle loadbalancing in Solr?
i have 8 solr shards running along with 3 zookeepers, some times if any of the servers fails it give me the following stacktrace, i can handle that with shards.tolerant=true in query.
My question is how to make this fault tolerant by default in…

Amey Jadiye
- 3,066
- 3
- 25
- 38
0
votes
3 answers
TIbco EMS Client Fault Tolerance
I am aware that the Tibco EMS provides Fault Tolerance in a hot backup configuration on the server side as detailed in the User's Guide, this answer and here.
But on the client side does Tibco EMS provide out of the box solution for fault-tolerant…

aateeque
- 2,161
- 5
- 23
- 35
0
votes
0 answers
How to reliably recover/resend data through sockets when servers goes down?
Let's say I have an internal system of 20+ nodes that pass data back and forth to each other through sockets where low latency is a high, high priority. How do I design it so that if a random server(s) goes down, I can recover/resend the data that…

Albert Lim
- 311
- 1
- 3
- 8
0
votes
2 answers
AWS: Instances and Reliability
Short of creating ginormous instances, is there any way to either force instances to run on separate physical machines or detect how many physical machines are being used by multiple instances of the same image on Amazon Web Services (AWS)?
I'm…

JackLThornton
- 375
- 3
- 14
0
votes
3 answers
How to get tolerance in forward direction not backward
I know the question can be unclear. I will attempt to explain.
I have a scenario where I need to verify for a sequence of values 5,10,15,20...., only that the system that produces sequence is not very accurate that sometimes it can miss the values…

user2927392
- 249
- 3
- 11