Highest Voted 'fault-tolerance' Questions

1

vote

1 answer

Spark 2.4.0 Structured Streaming Fault Tolerance from Kafka

I am having some questions about fault tolerance in Spark Structured Streaming, when reading from kafka. This is from the Structured Streaming Programming Guide: In case of a failure or intentional shutdown, you can recover the previous progress…

asked Mar 07 '19 at 09:19

Panagiotis Fytas

426
2
7
12

1

vote

1 answer

How to avoid loss of internal state of a master during fail-over to new master during a network partition

I was trying to implement a simple single master node against multiple backup nodes system to learn about distributed and fault tolerant architecture. Currently this is what my system looks like: N different nodes, each one identical. 1 master node…

distributed-computing distributed-system fault-tolerance

asked Nov 01 '18 at 10:26

Vikrant Biswas

123
3

1

vote

4 answers

Temporarily suspend : Azure Service bus Message Queue

We are using Azure Service bus Message Queue to process some action which are performed on third party API, issue we are having the third party API is down , what we want to do is suspend the queue temporarily so we can hold message till the third…

message-queue azureservicebus fault-tolerance

asked Oct 23 '18 at 13:58

Renu Saini

19
1
3

1

vote

1 answer

WLPs Microprofile Fault Tolerance bulkhead implementation not kicking in

Trying to test the Microprofile Fault Tolerance in WebSphere Liberty (WebSphere Application Server 18.0.0.3/wlp-1.0.22.cl180320180905-2337) on Java HotSpot(TM) 64-Bit Server VM, version 1.8.0_161-b12 (en_US) but i cannot get the bulkhead logic to…

java websphere-liberty fault-tolerance microprofile

asked Oct 09 '18 at 14:58

user2299548

85
5

1

vote

0 answers

Hyperledger Fabric - crash restore strategies

Yesterday faced with a nice problem: Nothing happens in case of chaincode container crash or someone manual stopping it. Sample network (using v1.2.0 images): 2 ORGs 2 CA's 2 peers ORG1 (using LevelDB as a storage) 2 peers ORG2 (using LevelDB as…

hyperledger-fabric hyperledger fault-tolerance

asked Aug 31 '18 at 13:27

rusbro

56
7

1

vote

1 answer

How does each backup/nodes get 2f replies in PBFT?

In Practical Byzantine Fault Tolerance(PBFT), the reason why 3f+1 is needed as the way I understand is to allow for the worst case scenario where: 1. f+1 nodes are normal 2. f nodes are unresponsive 3. f nodes are faulty So in the PREPARE phase,…

hyperledger-fabric blockchain fault-tolerance

asked May 04 '18 at 08:20

Bosen

941
2
12
26

1

vote

0 answers

Integration testing with TomEE embedded and Microprofile fault tolerance

I need to test some components in JavaEE environment which are using annotatations from Microprofile project, i.e. @Asynchronous and @Timeout from fault tolerance part of project. Implementation library for fault tolerance is Apache safe guard. In…

java integration-testing apache-tomee fault-tolerance microprofile

asked Apr 28 '18 at 06:56

Znas Me

190
1
15

1

vote

2 answers

How does Elasticsearch recover from a quorum that is not unanimous

When using replication with a quorum, Elasticsearch allows writes to fail for some (a small number of) replica shards. Writing to a replica might fail only because it is temporarily unavailable (because of a temporary network partition, for…

elasticsearch replication recovery fault-tolerance

asked Feb 16 '18 at 14:25

Raedwald

46,613
43
151
237

1

vote

1 answer

How PBFT applied in block chain?

I am trying to understand how PBFT(practical byzantine fault tolerance) applied in block chain. After reading paper, I found that process for PBFT to reach a consensus is like below: A client sends a request to invoke a service operation to the…

blockchain fault-tolerance

asked Feb 08 '18 at 22:56

Frank Kong

1,010
1
20
32

1

vote

3 answers

Is there a way to have a block of code executed atomically? (language does not matter)

I'm reading some papers on distributed systems. The authors claim to be able to have a sequence of operations executed atomically (either all operations are executed successfully or none is executed, even when system failures occurs). I wonder how…

atomic distributed-system fault-tolerance

asked Aug 01 '17 at 03:51

Burgess Chen

21
2

1

vote

1 answer

VMware Fault Tolerance possible Tests

I have been thinking about how I can test my Fault Tolerance machines. But I can't seem to come with a proper test. How can I possibly calculate the time it took for VMware to switch from the primary virtual machine to the secondary one?

vmware fault-tolerance

asked Jun 16 '17 at 12:06

Youssef Sakuragi

136
10

1

vote

2 answers

High Availability(HA) vs Fault Tolerance

Read couple of articles on Google like this but still not clear about what is difference b/w them? Purpose of both seems to provide the services when one component fails (be it hardware or software), a backup/secondary component takes over…

high-availability fault-tolerance

asked Jun 16 '17 at 11:48

scott miles

1,511
2
21
36

1

vote

2 answers

Why Apache Spark not re-submit failed tasks?

I want to simulate fault-tolerance behavior. I wrote "hard" function, that failed from time to time. for example: def myMap(v: String) = { // print task info and return "Ok" or throw exception val context = TaskContext.get() val r =…

scala apache-spark fault-tolerance

asked Apr 10 '17 at 13:17

Adel Chepkunov

79
1
9

1

vote

1 answer

VMWare FT Logging

I am newbie to VMWare. So while working on the standard switch i came across FTLogging. I did not found any best source. So can some one please expline where we use FTLogging and under which conditions we need to use FTLooging. What is the use of…

vmware esxi fault-tolerance

asked Mar 21 '17 at 07:03

ashok

11
6

1

vote

1 answer

The impact of correlated failures on cluster performance

In several presentations (e.g, 1, 2, 3) on cluster management, one of the scheduler's objectives is to reduce coordinated failures by distributing tasks of a single job across computing nodes that are less likely to fail together. Why are correlated…

cluster-computing distributed-computing job-scheduling fault-tolerance

asked Feb 03 '17 at 00:52

max

49,282
56
208
355

Questions tagged [fault-tolerance]