Questions tagged [fault-tolerance]

Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).

305 questions
0
votes
1 answer

Returning NACKed requests in RabbitMQ work queues

I'm trying to implement a work queue architecture using RabbitMQ. I have a single sender application and multiple consumers. I use manual ack on the consumers, so in case of failure in handling a request, it will be re-queued for another consumer to…
mich8bsp
  • 325
  • 1
  • 3
  • 9
0
votes
1 answer

How to automatically restart a failed node in Spark Streaming?

I'm using Spark on a cluster in Standalone mode. I'm currently working on a Spark Streaming application. I've added checkpoints for the system in order to deal with the master process suddenly failing and I see that it's working well. My question…
0
votes
1 answer

Need to setup Azure WebApp with Loadbalancer and fault tollerance

I have a website which is created using ASP.Net, C#, Azure SQL and it is hosted on Azure Webapp. I have a requirement where I need to setup Loadbalancer for website with fault tolerance. I have set up a Traffice manager where there are two replicas…
0
votes
2 answers

Exception handling in a real time, SQL-Server driven system

I have developed a report viewer in .NET Winforms (it just runs queries and displays results). This works against a reporting database. However, the above is a small subset of a much larger application, which gets data from another database. It…
GurdeepS
  • 65,107
  • 109
  • 251
  • 387
0
votes
0 answers

How to add fault tolerance support to an existing MPI based system such that the system continues even after a machine goes down?

I am trying to modify an MPI based system to add fault tolerance (process should continue if machines go down). I was thinking of using Apache Zookeeper to handle the machine failure case. Is it the best way to proceed further? Also, what happens…
0
votes
1 answer

Use case for Akka PoisonPill

According to the Akka docs for PoisonPill: You can also send an actor the akka.actor.PoisonPill message, which will stop the actor when the message is processed. PoisonPill is enqueued as ordinary messages and will be handled after messages that…
smeeb
  • 27,777
  • 57
  • 250
  • 447
0
votes
1 answer

Groovy Closures for Failover

I have the following class: class WidgetClient { List getAllWidgets() { _actuallyGetAllWidgets() } void saveWidget(Widget w) { _actuallySaveWidget(w) } void deleteWidget(Widget w) { …
smeeb
  • 27,777
  • 57
  • 250
  • 447
0
votes
1 answer

Error while configuring EMS with Database in Fault Tolerant mode

I am trying to setup my EMS in FT Mode, I have configured all the parameters in the 2 EMS config files. But Im getting the warning: Unable to initialize fault tolerant connection, remote server returned 'invalid user name' Servername and password…
Hakan Kiyar
  • 1,199
  • 6
  • 16
  • 26
0
votes
1 answer

How can I implement persistent/fault-tolerant replication using PouchDB?

PouchDB's replicate() functions are not fault-tolerant and will stop replicating if you loose your internet connection or encounter some network disruptions. This is quite frustrating when you need your app to replicate data whenever an internet…
redgeoff
  • 3,163
  • 1
  • 25
  • 39
0
votes
1 answer

Why does one need to write fault tolerant applications when building on cloud infrastructure?

I got this interview question today 'Why do you need to write fault tolerant applications when building on cloud infrastructure?' I answered: They are hard to debug and hard to fix, so they better be very well tested and robust. Data in database can…
Matas Vaitkevicius
  • 58,075
  • 31
  • 238
  • 265
0
votes
2 answers

How to handle loadbalancing in Solr?

i have 8 solr shards running along with 3 zookeepers, some times if any of the servers fails it give me the following stacktrace, i can handle that with shards.tolerant=true in query. My question is how to make this fault tolerant by default in…
Amey Jadiye
  • 3,066
  • 3
  • 25
  • 38
0
votes
3 answers

TIbco EMS Client Fault Tolerance

I am aware that the Tibco EMS provides Fault Tolerance in a hot backup configuration on the server side as detailed in the User's Guide, this answer and here. But on the client side does Tibco EMS provide out of the box solution for fault-tolerant…
aateeque
  • 2,161
  • 5
  • 23
  • 35
0
votes
0 answers

How to reliably recover/resend data through sockets when servers goes down?

Let's say I have an internal system of 20+ nodes that pass data back and forth to each other through sockets where low latency is a high, high priority. How do I design it so that if a random server(s) goes down, I can recover/resend the data that…
Albert Lim
  • 311
  • 1
  • 3
  • 8
0
votes
2 answers

AWS: Instances and Reliability

Short of creating ginormous instances, is there any way to either force instances to run on separate physical machines or detect how many physical machines are being used by multiple instances of the same image on Amazon Web Services (AWS)? I'm…
JackLThornton
  • 375
  • 3
  • 14
0
votes
3 answers

How to get tolerance in forward direction not backward

I know the question can be unclear. I will attempt to explain. I have a scenario where I need to verify for a sequence of values 5,10,15,20...., only that the system that produces sequence is not very accurate that sometimes it can miss the values…
user2927392
  • 249
  • 3
  • 11