Questions tagged [fault-tolerance]

Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).

305 questions
6
votes
2 answers

Spring batch: Fault Tolerant

I have the following Step: return stepBuilderFactory.get("billStep") .allowStartIfComplete(true) .chunk(20000) .reader(billReader) .processor(billProcessor) .faultTolerant() …
sophie
  • 991
  • 2
  • 15
  • 34
6
votes
1 answer

Microservices styles and tradeoffs - Akka cluster vs Kubernetes vs

So, here's the thing. I really like the idea of microservices and want to set it up and test it before deciding if I want to use it in production. And then if I do want to use it I want to slowly chip away pieces of my old rails app and move logic…
Matjaz Muhic
  • 5,328
  • 2
  • 16
  • 34
6
votes
2 answers

Apache Storm: Track tuples by unique ID from Source Spout to Final Bolt

I want a method of uniquely identifying tuples throughout a whole Storm topology, so that each tuple can be tracked from Spout to the final Bolt. The way I understand it is when passing a unique message id with an emit from a spout for example:…
perkss
  • 1,037
  • 1
  • 11
  • 37
6
votes
1 answer

How is the detection of terminated nodes in Erlang working? How is net_ticktime influencing the control of node liveness in Erlang?

I set net_ticktime value to 600 seconds. net_kernel:set_net_ticktime(600) In Erlang documentation for net_ticktime = TickTime: Specifies the net_kernel tick time. TickTime is given in seconds. Once every TickTime/4 second, all connected nodes are…
Zuzana
  • 132
  • 6
6
votes
1 answer

Service Stack Redis reconnect after Redis server reboot

We are using Service Stack's RedisClient's BlockingDequeue to persist some data until it can be processed. The calling code looks like using (var client = ClientPool.GetClient()) return…
swestner
  • 1,881
  • 15
  • 19
6
votes
1 answer

Store and forward HTTP requests with retries?

Twilio and other HTTP-driven web services have the concept of a fallback URL, where the web services sends a GET or POST to a URL of your choice if the main URL times out or otherwise fails. In the case of Twilio, they will not retry the request if…
Matt J
  • 43,589
  • 7
  • 49
  • 57
6
votes
2 answers

More graceful error handling in C++ library - jsoncpp

I'm not sure if this will be a specific thing with jsoncpp or a general paradigm with how to make a C++ library behave better. Basically I'm getting this trace: imagegeneratormanager.tsk: src/lib_json/json_value.cpp:1176: const Json::Value& …
djechlin
  • 59,258
  • 35
  • 162
  • 290
5
votes
3 answers

Disable tolerance (or enable strictness) in Firefox when rendering HTML

Firefox has a certain tolerance when rendering bad HTML. This means even if a closing tag is left out, the HTML will be displayed as if everything was fine. This tolerance aspect is particularly relevant when one is using JavaScript to manipulate or…
unode
  • 9,321
  • 4
  • 33
  • 44
5
votes
1 answer

Handle Akka actor bounded mailbox MessageQueueAppendFailedException

To avoid OOM, I'm bounding the mailbox size of some of my Akka 1.1.3 actors with a shared custom dispatcher. For example: object Static { val dispatcher = Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher( …
Bluu
  • 5,226
  • 4
  • 29
  • 34
5
votes
5 answers

Fail fast finally clause in Java

Is there a way to detect, from within the finally clause, that an exception is in the process of being thrown? See the example below: try { // code that may or may not throw an exception } finally { SomeCleanupFunctionThatThrows(); //…
Greg Rogers
  • 35,641
  • 17
  • 67
  • 94
5
votes
2 answers

Misunderstanding of spark RDD fault tolerant

Many say: Spark does not replicate data in hdfs. Spark arranges the operations in DAG graph.Spark builds RDD lineage. If a RDD is lost they can be rebuilt with the help of lineage graph. So there is no need of data replication as the RDDs can be…
5
votes
3 answers

How do I isolate untrusted native code in Java?

I have a piece of C library that I don't trust (in the sense that it might crash frequently). I am calling this from a Java process. To prevent the crash in C library bringing the whole Java app. down, I figured it will be best if I spawn a…
Enno Shioji
  • 26,542
  • 13
  • 70
  • 109
5
votes
1 answer

How to robustly, but minimally, distribute items across a peer-to-peer system

If one has a peer-to-peer system that can be queried, one would like to reduce the total number of queries across the network (by distributing "popular" items widely and "similar" items together) avoid excess storage at each node assure good…
5
votes
3 answers

Why does simple 3-way majority voting not solve Byzantine faults?

I have been reading many papers recently about Byzantine fault tolerance. There is a common proof that 3m+1 computers are needed to handle m Byzantine faults. The general proof goes something like this: There are three "generals": A, B, and C.…
5
votes
2 answers

Error monitoring/handling on webservers

We have a web server that we're about to launch a number of applications onto. They will all share database and memcached servers, but each application has it's own mySQL database and all memcached keys per application, is prefixed. Possible…
Industrial
  • 41,400
  • 69
  • 194
  • 289
1 2
3
20 21