Questions tagged [fault-tolerance]

Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).

305 questions
0
votes
2 answers

What is this default tolerance value for Binary (as well as other types of) variables in CPLEX?

Elaborating on the question, say the optimal solution has a few binary variables which are assigned to 0's and 1's; in reality when comparing these variables,we will often get their values as slightly higher or lower than what they should be…
0
votes
0 answers

Does the private blockchain have to follow client-server model when considering BFT?

I'm a newbie, currently interested in data security & integrity. I'm quite new to blockchain and distributed system theories, and suffering from some unclear doubts/questions on the fault-tolerant consensus. May I ask for your kind advice on my dull…
J.L.
  • 1
  • 1
0
votes
0 answers

Delete a remote object from outside the distributed system

I have been playing around with Pyro, particularly Pyro5 and wanted to simulate some fault tolerance algorithms by creating a distributed system consisting of remote processes and RPC's using Pyro5 in Python. Does anybody know how to delete / crash…
0
votes
2 answers

Logging Microprofile fault tollerance events

I am working on a Quarkus app that uses the smallrye microprofile fault tolerance implementation. We have configured fault tolerance on the client definitions via the annotations API (@Retry, @Bulkhead, etc) and it seems to work but we don't get any…
0
votes
0 answers

Simulating node failure for testing purposes

I am developing fault tolerance mechanisms for a distributed application in Rust. I need to simulate failure of one node (and eventually more). The kind of failure to simulate is a node crash. I want the application to completely exit with error in…
javier
  • 113
  • 9
0
votes
1 answer

Fault tolerant solution for in memory aggregation before writing to database

I'm desiging a high performance system with the main function is to update a product inventory. Each product has an unique product id, and we can add/substract number of items of that product in the investory. To improve the performace, I don't want…
Ast15
  • 71
  • 1
  • 3
0
votes
0 answers

Compare 2 data frames with a tolerance

I have 2 data frames, each containing 4 columns and hundreds of rows. Although the columns are the same, the rows may appear in any order. I need to reconcile these data frames and ensure that everything in DF1 is also in DF2 and vice versa, and…
0
votes
1 answer

How to Control Size of Flink Checkpoints

I am running a simple Flink aggregation job which consumes from Kafka and applies multiple windows(1 hr, 2 hr...upto 24 hours) with specific sliding interval and does the aggregation on windows. Sometimes the job restarts and we loose the data as it…
0
votes
0 answers

Converting Hystrix to Resilience4J - Advanced Configurations

I'm involved with a project to convert our spring-boot codebase from Hystrix to Resilience4J. Some of the conversions have been straightforward but there are some more complicated ones that I'm not sure how to convert. We have classes that have…
0
votes
0 answers

Infrastructure-independent availability/fault tolerance guarantees

I've been thinking about defining some infrastructure-independent metrics for SLA requested by customer. The developed software is being deployed on-premise within customer's DC and managed by customer's technical staff, - therefore I cannot give…
0
votes
1 answer

SBFT Multiple Leader Ordering Service

How can a Smart-BFT Multiple Leader Ordering Service be implemented ? What are the implementation changes that needs to be incorporated (refer to some source code) ? I was looking at…
0
votes
1 answer

Quarkus resilience best practice

I have a use case like the following: One Quarkus microservice is responsible for talking with several other fixed APIs (e.g. ArgoCD REST API, Standard Corporate Driven API) to bring the whole system in the desired state. The whole request needs to…
Peter C. Glade
  • 543
  • 2
  • 8
  • 16
0
votes
0 answers

Reading and Combining thru Writing multiple CSV files with restore point

I have to read many existing CSV files on a External Drive and combine the in Sequence (Sequencing is Critical) with restore point and write to output.csv on same External Drive in different path. Example A.CSV, B.CSV and so on to Output.csv , I am…
TechiRA
  • 29
  • 3
0
votes
1 answer

Kafka connect-distributed mode fault tolerance not working

I have created kafka connect cluster with 3 EC2 machines and started 3 connectors ( debezium-postgres source) on each machine reading a different set of tables from postgres source. In one of the machines, I started the s3 sink connector as well. So…
0
votes
1 answer

Flex AMF offline mode?

I am currently using Flex (Flash Builder 4) and making web service connections to a Apache PHP Zend AMF server to retrieve data. This works great, but I am wondering what options are available for fault tolerance. I know I can probably set up a…
Scott Szretter
  • 3,938
  • 11
  • 57
  • 76