Questions tagged [fault-tolerance]

Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).

305 questions
0
votes
0 answers

How Linux resists single event upsets?

I am collecting information about single event upsets for my report at the university. I found plenty of helpful articles for my theme, but I got stuck with operating systems' resistance against SEU. I chose Linux as one that's being used in…
0
votes
1 answer

Fault Tolerance and Kubernetes StatefulSet

As I understand it, most databases enable the use of replicas that can take over from a leader in case the leader is unavailable. I'm wondering the necessity of having these replicas in a Kubernetes environment, when using say a StatefulSet. Once…
vmayer
  • 985
  • 2
  • 9
  • 18
0
votes
0 answers

Scala Akka OneForOneStrategy java.lang.NullPointerException: null

I am trying to implement fault tolerance within my actor system for my Scala project, to identify errors and handle them. I am using Classic actors. Each supervisor actor has 5 child actors, if one of these child actors fails, I want to restart that…
0
votes
0 answers

Scala Akka Fault Tolerance/Supervision not working

I am trying to implement fault tolerance within my actor system for my Scala project, to identify errors and handle them. I am using Classic actors. Each supervisor actor has 5 child actors, if one of these child actors fails, I want to restart that…
0
votes
1 answer

Making redis durable with a slave redis queue

Maybe I am missing something, this seems too simple. Is it possible to make redis durable by having a master redis node duplicate data to a slave redis node? My situation I have a REST endpoint which upon recieving a request from a client sticks the…
friartuck
  • 2,954
  • 4
  • 33
  • 67
0
votes
1 answer

K8s Control Plane Fault-tolerance | Minimum nodes and Leader Election

On a stacked (etcd+master on the same node) control plane setup we need a minimum of 3 nodes to achieve Quorum but what is the requirement for a setup where we have external etcd nodes? Etcd needs a minimum of 3 but what is the minimum number of…
Leo Lazarus
  • 107
  • 1
  • 7
0
votes
1 answer

How to determine optimal fault tolerant memory for filex?

in filex example file, given to fx_fault_tolerant_enable function RAM buffer in size of FX_FAULT_TOLERANT_MAXIMUM_LOG_FILE_SIZE which is 3K. I would want to reduce this define and the RAM buffer as much as I can. What are the parameters I need to…
jack
  • 1
  • 2
0
votes
4 answers

What assumptions can I make about global time on Azure?

I want my Azure role to reprocess data in case of sudden failures. I consider the following option. For every block of data to process I have a database table row and I could add a column meaning "time of last ping from a processing node". So when a…
sharptooth
  • 167,383
  • 100
  • 513
  • 979
0
votes
0 answers

GRPC, Golang, How to connect to alternate path to same server on network error

I have 2 servers, They have at least 2 interfaces, When one of the interface is down, I want to grpc to try next interface in same RPC call, for example N1 ip-eth0 <-- Path1 ---> N2 ip-eth0 N1 ip-eth1 <-- Path2 ---> N2 ip-eth1 In this case, while…
tushars
  • 53
  • 6
0
votes
1 answer

Return a reject table from a stored procedure

I want my stored procedure to fill a 'reject' table for the line from my staging and which can not be injected in my target table (for example a line without description which is NOT NULL in my target). I don't have an idea about the structure of…
nada
  • 21
  • 1
  • 7
0
votes
1 answer

Autowire doesn't work after enabling Hystrix

I'm encountering a issue, where autowiring doesn't work when I add Hystrix(EnableHystrix) in my microservice. Controller class: @RestController @RequestMapping("/login") @Slf4j public class LoginController { @Autowired ILoginService…
0
votes
0 answers

Polly IHttpClientFactory and handling 401's (Unauthorised)

I'm using Polly and IHttpClientFactory to handle my fault tolerance, which is working fine for retry and circuit breaker in .NET Core 3.1 I'm not trying to handle any 401's, which I can do within the controller, but I can't find a way to refresh the…
Coppermill
  • 6,676
  • 14
  • 67
  • 92
0
votes
1 answer

Resilience 4J with Amazon SNS

Is it a good idea to built fault tolerance over an Amazon SNS call. As Amazon services are built to be resilient should we add one more layer of fault tolerance or trust amazon to handle that part?
RoDev
  • 61
  • 8
0
votes
1 answer

Fallback for DynamoDB with SQS

We have a synchronous REST endpoint that does other processing apart from saving item to DynamoDB database which will be used for later purpose. The requirement is to not error out if the database save fails due to any type of exception. How do we…
0
votes
1 answer

May snapshot mechanism spend more and more memory in Apache Flink

I'm learning how snapshot mechanism works in Flink. As my understanding, JobManager will insert the barriers into each Data Source with a fixed interval, and each operator will do a snapshot once it receive nth barriers from all of its data…
Yves
  • 11,597
  • 17
  • 83
  • 180