How does fault tolerance works in a distributed system?

Question

I didn't have the privilege to take a course on distributed systems. I am reading up on distributed systems and came to know about replication etc.

Can you tell me which strategy is the most popular/most used for handling fault tolerance or does it depend on a case to case basis? Otherwise which would be the simplest to understand?

I have a sample problem:

Suppose I have 3 servers and degree of replication is 2.

So Server A has files: x y

Server B: y z

Server C: z x

Now, each server can receive a request from the user and needs to know which server has which file. I know the general techniques of deciding which server has which file: like order of appearance, hashing by key value, using actual value etc.

So suppose we are using hashing.

We need to store the hash table/lookup on each server, correct? Or can we just get away with storing the hash function itself?
By using hashing, we can get the ID of first system where we are going to store this file. But what about the 2nd system? Do we use a separate hash function for deciding the replicating server?
In case we need to store a hash table, do we need to store it on each server? How do we ensure that when we store a file, all 3 server's hash tables get updated and are consistent?

On a final note, can you suggest me video resource, like youtube videos/Coursera course related to distributed systems or a good book. I want to learn the basic concepts like these.

How does fault tolerance works in a distributed system?

I have a sample problem:

0 Answers0