Questions tagged [distributed-system]

A distributed system consists of a collection of autonomous computers, connected through a network and distribution middleware, which enables computers to coordinate their activities and to share the resources of the system, so that users perceive the system as a single, integrated computing facility.

A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages.

1253 questions
20
votes
2 answers

What's the difference between ZooKeeper and any distributed Key-Value stores?

I am new to zookeeper and distributed systems, and am learning it myself. From what I understand for now, it seems that ZooKeeper is simply a key-value store whose keys are paths and values are strings, which is nothing different from, say, Redis.…
OneZero
  • 11,556
  • 15
  • 55
  • 92
18
votes
4 answers

How to solve the famous `unhandled cuda error, NCCL version 2.7.8` error?

I've seen multiple issue about the: RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1614378083779/work/torch/lib/c10d/ProcessGroupNCCL.cpp:825, unhandled cuda error, NCCL version 2.7.8 ncclUnhandledCudaError: Call to CUDA function…
Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
18
votes
3 answers

NoSQL and eventual consistency - real world examples

I'm looking for good examples of NoSQL apps that portray how to work with lack of transactionality as we know it in relational databases. I'm mostly interested in write-intensive code, as for mostly read-only code this is a much easier task. I've…
julx
  • 8,694
  • 6
  • 47
  • 86
18
votes
2 answers

Distributed application - is load balancer single point of failure?

In general, I want to understand in a distributed application - is the load balancer a single point of failure? I am not sure, but this can be an Apache load balancer or on top of that a device/hardware load balancer as provisioned from F5 Network,…
lowLatency
  • 5,534
  • 12
  • 44
  • 70
17
votes
5 answers

Differences between Strict Serializable and External Consistency

I follow this great blog. In this blog, the author has drawn a complete picture of all types of isolation and consistency and the relationship between them. But based on the Google's blog, there is another type of consistency named External…
Trần Kim Dự
  • 5,872
  • 12
  • 55
  • 107
17
votes
3 answers

How CA distributed system according to Cap Theorem can exist

How can a distributed system be consistent and available (CA)? Because I would argue when a network partition occurs, CA cannot be possible in a way where every node of the network, even the partioned nodes that users are connected to, continue to…
pvjhs
  • 549
  • 1
  • 9
  • 24
17
votes
4 answers

How do Raft guarantee consistency when network partition occurs?

Suppose a network partition occurs and the leader A is in minority. Raft will elect a new leader B but A thinks it's still the leader for some time. And we have two clients. Client 1 writes a key/value pair to B, then Client 2 reads the key from A…
16
votes
5 answers

Replication vs Redundancy

I am currently reading about Distributed Systems and I am facing two different terms which are described in a similar manner: Replication and Redundancy. Can anyone explain each term in part?
Dina Bogdan
  • 4,345
  • 5
  • 27
  • 56
15
votes
3 answers

Amazon S3 architecture

While the post @ http://highscalability.com/amazon-architecture explains Amazon's architecture in general, I am interested in knowing how Amazon S3 is implemented. Some of my guesses are A distributed file system like…
Sukumar
  • 3,502
  • 3
  • 26
  • 29
15
votes
5 answers

Programming languages for distributed system

I've been doing socket programming for a while in C++, and kind of got tired of having to write the same code to handle for errors, serializing / deserializing data, etc. Are there programming languages out there that have first-class support for…
sivabudh
  • 31,807
  • 63
  • 162
  • 228
15
votes
1 answer

Sequential Consistency in Distributed Systems

I am learning Sequential Consistency in Distributed Systems but just could not understand the terms explained. I would appreciate if someone can shed some light in layman's term on why (a) and (c) below are sequentially consistent and (b) is…
user23
  • 415
  • 1
  • 8
  • 22
15
votes
1 answer

How are distributed queues architectured?

What are architectural patterns/solutions that make distributed queues tick? Please share for both ordered and non-ordered types.
Bohdan
  • 16,531
  • 16
  • 74
  • 68
15
votes
2 answers

ZooKeeper and RabbitMQ/Qpid together - overkill or a good combination?

Greetings, I'm evaluating some components for a multi-data center distributed system. We're going to be using message queues (via either RabbitMQ or Qpid) so agents can make asynchronous requests to other agents without worrying about addressing,…
15
votes
1 answer

Why is merging Python system classes with custom classes less desirable than hooking the import mechanism?

I am working on a project that aims to augment the Python socket messages with partial ordering information. The library I'm building is written in Python, and needs to be interposed on an existing system's messages sent through the socket…
jspacek
  • 1,895
  • 3
  • 13
  • 16
12
votes
3 answers

How to design task distribution with ZooKeeper

I am planning to write an application which will have distributed Worker processes. One of them will be Leader which will assign tasks to other processes. Designing the Leader elelection process is quite simple: each process tries to create a…
Sabya
  • 11,534
  • 17
  • 67
  • 94
1
2
3
83 84