Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).
Questions tagged [fault-tolerance]
305 questions
0
votes
1 answer
What Linux tools are available to monitor/configure deployed code?
I'm writing some telecommunications software, and must devise a way to monitor and configure the software after it has been deployed on a server.
The company I work for currently has an in-house solution, but we're exploring other options.
What…

Dylan Klomparens
- 2,853
- 7
- 35
- 52
0
votes
1 answer
Innodb log flush to prevent data loss
I need to have a high power fault tollerance to my innodb engine and i can not change hardware configuration.
Do you suggest to call
FLUSH ENGINE LOGS;
after a very important operations?
Can it help to prevent data loss from poweroff or process…

Tobia
- 9,165
- 28
- 114
- 219
0
votes
0 answers
Is it true to say that Hadoop can't handle Byzantine failures?
I have been reading some papers on Hadoop and map-reduce.
It seems that the current design enables Hadoop to tolerate failures like worker crashes, but doesn't provide much support for handling arbitrary faults(non fail-silent ones). Just wondering…

awesomeIT
- 69
- 9
0
votes
1 answer
OpenMPI fault tolerance
I have an assignment to implement simple fault-tolerance in an OpenMPI application. The problem we are having is that, despite setting the MPI error handling to MPI_ERRORS_RETURN, when one of our nodes is unplugged from the cluster we get the…

iondune
- 283
- 1
- 2
- 7
0
votes
1 answer
Detecting crashes of Azure instances
I want to detect the fact that an instance of my Azure role has crashed. Detection in my case means that another instance of my role is notified about the crash. Please review my idea explained below or propose another solution.
The idea I came up…

SergeyS
- 3,909
- 2
- 15
- 17
0
votes
1 answer
Is it better to catch exceptions or avoid exceptions at any cost?
What's the best practice when you're dealing with exceptions?
I usually write code to avoid exceptions at any cost, my code usually has a lot of conditions and if I'm dealing with normalized databases, I usually write a bunch of queries that double…

ILikeTacos
- 17,464
- 20
- 58
- 88
0
votes
1 answer
What happens in Erlang if return receipt never arrives?
I just happened to read the thesis of Joe Armstrong and don't have much prior knowledge of Erlang. I wonder what happens if a delivery receipt for some message never arrives. What does the sending actor do? It sends the message another time? This…

OlliP
- 1,545
- 11
- 22
0
votes
3 answers
Long query for testing
I'm trying some fault-tolerance and in aplication and does anyone know of a long executing query using the default mysql tables?
The idea is to run that query, crash the mysqld to see if my app detects the error and tries to connect to another…

User Conscious
- 105
- 1
- 2
- 8
0
votes
2 answers
How does Google App Engine infrastructure is fault tolerant?
I am actually implementing a web application on Google App Engine. This has taken me for the moment a huge time in re-designing the database and the application through GAE requirements and best practices.
My problem is this: How can I be sure that…

myss_sy
- 1
0
votes
1 answer
LDAP Fault-tolerance configuration (e.g SunOne)
LDAP Fault-tolerance configuration (e.g SunOne):
Does anyboby know how to configuration "Fault-tolerance" for LDAP, e.g SunOne LDAP.
I search via google without any userful result?
Thanks

ShawnLee
- 157
- 2
- 9
0
votes
1 answer
Benefit of Erlang for collaborative real time application
I am looking into creating a real-time document editing and chat application. I have been wanting to learn Erlang for a while, and I was wondering whether this might be a good project to try it out on.
Specifically, at what point would I start to…

Andrew
- 2,084
- 20
- 32
0
votes
1 answer
Fault tolerant system design
There is a DB as data store and y (>5) other machines. There is a machine A that has data (updated) every x mins. The y machines gets the data from Machine A every x mins, updates the data in the database. Every machine doing the same is for some…

Sam
- 933
- 5
- 14
- 26
0
votes
1 answer
Java implementation of fault tolerance for a P2P application
I have a P2P application coded by LiteSoft.org. I am looking to implement a leader election system within this application. Before I can even start that, I have to have a fault tolerance system that will be consisted of sending requests to a peer,…

Jean-François Beaulieu
- 4,305
- 22
- 74
- 107
0
votes
1 answer
Creating fault tolerant system - Use data file to reload save data?
EDIT - The implementation Language is Java.
I want to make a simple fault tolerant system.
Object A - This object contains the decision logic for the system.
Object B - This object will be used control the fault tolerance
My initial ideas are to…

Mike Howard
- 53
- 1
- 8
0
votes
3 answers
Advantages of using an Erlang web server for a web application
Note: This question is heavily affected by the main requirement to the web application that I build: high availability and fault tolerance. All the other requirements (like scalability and number of users) is not in question here.
I have got and…

skanatek
- 5,133
- 3
- 47
- 75