Questions tagged [fault-tolerance]

Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).

305 questions
0
votes
1 answer

Modify U-Boot to rely on addresses in mmc instead of filesystem

For context, I'm trying to make everything in flash as fault-tolerant as possible. Ideally, I'd like to just store the kernel image and initrd file as just BLOBs on MMC. So as I understand, U-Boot looks for an extlinux.conf or boot.scr file, but as…
0
votes
1 answer

How to understand checkpoint in Flink correctly

I know that Flink uses checkpoint mechanism to guarantee Exactly-once. But I want to know more details. If I'm right, each Operator has its own checkpoint. I can not understand how these checkpoints work together. Saying that I have two source tasks…
Yves
  • 11,597
  • 17
  • 83
  • 180
0
votes
1 answer

Apache Flink Stateful Functions forwarding the same message to N functions

I'm trying to send incoming messages to multiple stateful functions but I couldn't fully understand how to do. For the sake of understandability let's say one of my stateful function getting some integers and sending them to couple of remote…
0
votes
1 answer

Detecting duplicates in a data integration system

I am looking for ways to avoid the transfer of duplicate files when transferring through HTTP and SFTP. My system stores the state of the transfer each time a transfer is performed into an external cache. Before each transfer, I look up the…
Vijay Muvva
  • 1,063
  • 1
  • 17
  • 31
0
votes
1 answer

In azure data factory copy activity ,fault tolerance is not applicable to check constraint

I want to copy data from one table to another in azure data factory using copy activity.In source there are no constraints but on sink I have check constraint on age column ---> check(age>=18). What I have observed is,if while copy activity runs…
0
votes
2 answers

K8s fault tolerance

I was going through the differences between Swarm vs K8s one of the cons of Swarm is that it has limited fault tolerance functionality. How does K8s achieve fault tolerance, is it via K8s multi-master. Please share your inputs
Zaks
  • 668
  • 1
  • 8
  • 30
0
votes
0 answers

why practical byzantine faul tolerant algorithm is said to be asynchronous?

In the PBFT paper the authors say that "The algorithm does not rely on synchrony to provide safety. Therefore, it must rely on synchrony to provide liveness [...] We guarantee liveness, i.e., clients eventually receive replies to their requests,…
nino
  • 43
  • 1
  • 5
0
votes
1 answer

golang's hystrix library "circuit open" without "timeout" error

we use hystrix for our golang application, here we are getting hystrix: circuit open error even though there is no hystrix: timeout hystrix configuration: hystrix.ConfigureCommand(cmd, hystrix.CommandConfig{ Timeout: timeout, …
Sharath BJ
  • 1,393
  • 1
  • 12
  • 16
0
votes
0 answers

Automating Namecheap's browser DNS update with Java — can anyone give some suggestions to proof-of-concept code?

The domain name registrar Namecheap offers a service where customers can point their domains to their IP addresses simply using the browser. How do I use a browser to dynamically update the host's IP? The Dynamic DNS feature is available only for…
0
votes
1 answer

Can we set tolerance level on regex annotator in Ruta?

I am annotating Borrower Name "Borrower Name" -> BorrowerNameKeyword ( "label" = "Borrower Name"); But I get this text post OCR analysis. At times I might get Borrower Name as B0rr0wer Nane. Is this possible to set tolerance limit so that this text…
0
votes
2 answers

PBFT consensus algorithm and double spending

I am trying to figure out how PBFT consensus algorithm deals with the problem of double spending. I've read lots of literature but cannot seem to find an answer
0
votes
2 answers

Stream Processing: How often should a checkpoint be initiated?

I am setting up an analytics pipeline using Apache Flink to process a stream of IoT data. While attempting to configure the system, I cannot seem to find any sources for how often checkpointing should be initiated? Are there any recommendations or…
0
votes
1 answer

How a typical cluster of five servers can tolerate the failure of any two servers?

I am reading Raft-extended paper and above statement was there. Also I found a statement in the web saying failures of f servers can be tolerated if there were 2*f+1 servers. It's obvious to have another two servers where f=1. Is there an inductive…
Amila Senadheera
  • 12,229
  • 15
  • 27
  • 43
0
votes
2 answers

Configure settings of .NET Framework 4.5's System.IO.FileSystemWatcher to be Communicative about Errors, Fault-tolerant, Robust, intelligent, etc

At my office, we are using ( https://learn.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher?view=netframework-4.5 ) .NET Framework 4.5's System.IO.FileSystemWatcher We have a application modules running in a distributed network…
0
votes
1 answer

AWS Multi Region Service Availability and Operations

Some of the AWS Services give the ability to replicate between regions. e.g. S3 (CRR), RDS (Read Replica) etc. In S3-CRR, what happens if the destination Region goes down? Does the replication catch up automatically, once the Region is backup?…
Divs
  • 1,578
  • 2
  • 24
  • 51