Questions tagged [fault-tolerance]

Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).

305 questions
2
votes
2 answers

How to override an interface fault tolerance annotation in application.properties

I am trying to override an interface Timeout Fault tolerance annotation in application.properties. However I am not sure if it is possible to override annotation parameters via configuration file using property…
2
votes
2 answers

Spark checkpointing behaviour

Does Spark use checkpoints when we start a new job? Let's say we used a checkpoint to write some RDD to a disk. Will the said RDD be recalculated or loaded from the disk during a new job?
BAS
  • 23
  • 1
  • 3
2
votes
0 answers

Why is writing into an existing file on windows so susceptible to data loss on power failure, even after file close?

I use robocopy to mirror a directory across a network to a machine. The destination machine is susceptible to power loss. Several times I've found that after a successful robocopy, some of the files are the right size/date, but are empty (all NUL…
aggieNick02
  • 2,557
  • 2
  • 23
  • 36
2
votes
1 answer

What is the benefit of non fault tolerance blockchain network

I'm learning about the different Hyperledger based blockchain-frameworks and currently I'm reading about Sawtooth even though the question is not particularly related with Sawtooth. Given that PoET is as good consensus algorithm as any, what I…
Leron
  • 9,546
  • 35
  • 156
  • 257
2
votes
1 answer

Corda BFT notary cluster halts after one replica goes down

TL;DR - BFT cluster with 4-5 notary nodes grinds to a halt when one replica is killed. I ran the notary demo and the Raft cluster (with 3 notary nodes) behaved as expected - when I kill the leader, there's an election and the notary cluster…
qlfu_qlfu
  • 21
  • 2
2
votes
2 answers

Kafka Streams stateStores fault tolerance exactly once?

We're trying to achieve a deduplication service using Kafka Streams. The big picture is that it will use its rocksDB state store in order to check existing keys during process. Please correct me if I'm wrong, but to make those stateStores fault…
Yannick
  • 1,240
  • 2
  • 13
  • 25
2
votes
2 answers

WLPs MicroProfile (FaultTolerance) Timeout Implementation does not interrupt threads?

I'm testing the websphere liberty's fault tolerance (microprofile) implementation. Therefore I made a simple REST-Service with a ressource which sleeps for 5 seconds: @Path("client") public class Client { @GET @Path("timeout") …
2
votes
1 answer

Is there any standard for supporting Lock-step processor?

I want to ask about supporting Lock-step(lockstep, lock-step) processors in SW-level. As I know, in AUTOSAR-ASILD, Lock-step processor is used for fault torelant system as below scenario. The input signals for a processor is copied to another…
2
votes
4 answers

design patterns for transactional services with checkpoints and recovery

I have a multistep process where each step does some network IO (web service call) and then persists some data. I want to design it in a fault tolerant way so that if the service fails, either because of a system crash or one of the steps fails, I…
user308808
2
votes
1 answer

The type Prevayler is not generic; it cannot be parameterized with arguments

I have been reading about checkpointing technique for fault tolerance. So, I am working with prevayler java library for checkpointing. Now, I have a error showed The type Prevayler is not generic; it cannot be parameterized with arguments

. Can…

Miss Saung
  • 21
  • 5
2
votes
1 answer

Recovering from HBase server failure using Async HBase client

I'm currently trying to find a way to deal with unexpected HBase failures in my application. More specifically, what I'm trying to solve is a case where my application inserts data to HBase and then HBase fails and restarts. In order to check how…
Gideon
  • 2,211
  • 5
  • 29
  • 47
2
votes
1 answer

Detect stopped server process via rpyc.Connection

Suppose I have a Service: import rpyc class MyService(rpyc.Service): my_dict = {} def exposed_put(self, key, val): MyService.my_dict[key] = val def exposed_get(self, key): return MyService.my_dict[key] def…
Mack
  • 2,614
  • 2
  • 21
  • 33
2
votes
1 answer

A subcase of byzantine agreement (4 processes, the commander is a traitor)

In case you don't know the problem, see this. Brief introduction: Processes communicate by reliable and timely messages. Traitors lie, also cheat on forwarding messages, they try to confuse loyals. Loyals try to agree on non-trivial actions…
JACK M
  • 2,627
  • 3
  • 25
  • 43
2
votes
1 answer

Akka and Backup/Fallback Actors

I am coming to Akka after spending quite a bit of time over in Hystrix-land where, like Akka, failure is a first-class citizen. In Hystrix, I might have a SaveFizzToDbCmd that attempts to save a Fizz instance to an RDB (MySQL, whatever), and a…
smeeb
  • 27,777
  • 57
  • 250
  • 447
2
votes
1 answer

Designing Akka Supervisor Hierarchy

Please note: I am a Java developer with no working knowledge of Scala (sadly). I would ask that any code examples provided in the answer would be using Akka's Java API. I am brand-spanking-new to Akka and actors, and am trying to set up a fairly…
smeeb
  • 27,777
  • 57
  • 250
  • 447