2

I am reading about distributed systems and getting confused with what is really means?

I understand on high level, it means that set of different machines that work together to achieve a single goal.

But this definition seems too broad and loose. I would like to give some points to explain the reasons for my confusion:

  1. I see lot of people referring the micro-services as distributed system where the functionalities like Order, Payment etc are distributed in different services, where as some other refer to multiple instances of Order service which possibly trying to serve customers and possibly use some consensus algorithm to come to consensus on shared state (eg. current Inventory level).

  2. When talking about distributed database, I see lot of people talk about different nodes which possibly use to store/serve a part of user request like records with primary key from 'A-C' in first node 'D-F' in second node etc. On high level it looks like sharding.

  3. When talking about distributed rate limiting. Some refer to multiple application nodes (so called distributed application nodes) using a single rate limiter, some other mention that the rate limiter itself has multiple nodes with a shared cache (like redis).

It feels that people use distributed systems to mention about microservices architecture, horizontal scaling, partitioning (sharding) and anything in between.

Kumar
  • 1,536
  • 2
  • 23
  • 33
  • Seems like you pretty much got the point. There is no real question here, so I'll just leave you with a great ressource on the subject: [system-design-primer](https://github.com/donnemartin/system-design-primer) – madmax Jun 29 '22 at 07:23
  • 1
    Are you not satisfied with the [Wikipedia entry](https://en.wikipedia.org/wiki/Distributed_computing)? – Reinhard Männer Jun 29 '22 at 14:42
  • It's a broad definition, so you can't expect every use of it to be consistent. – AlexApps99 Jun 30 '22 at 01:15
  • @ReinhardMänner: My confusion is - Is distributed database just sharding? Similarly for distributed rate limiting, does it mean multiple nodes of rate limiter or multiple nodes of application which is using a same rate limiter? – Kumar Jun 30 '22 at 06:39
  • @AlexApps99: Thanks for your response. My question is when someone tells that we have distributed DB, what should I interpret - Does it mean that the DB is sharded? Similarly, for distributed rate limiter? – Kumar Jun 30 '22 at 06:45
  • 1
    It's a vague word, so you can't really glean any specific technical meaning from it other than the fact that the thing is spread across multiple devices. The underlying technical implementation could be anything that satisfies the meaning of the word, so that's all you can really go off. – AlexApps99 Jun 30 '22 at 10:42
  • 1
    Sharding is one database concept that could be considered distributed, but so is redundancy, load balancing, blockchain, and other miscellaneous things. Distributed does not implicitly mean any of these terms, it's a blanket word for things which are distributed. – AlexApps99 Jun 30 '22 at 10:46
  • If you think about it, distributed system can be consider as just the opposite of a centralised system. And the term really is designed to be generic to be referenced by many things imo. – johncitizen Jul 03 '22 at 23:41
  • Kumar if your focus is "distributed DB" then it has some more specific context than simply "What is meant by Distributed System?". Distributed System is more like a concept or architecture than a specific domain or topic (e.g., sharding, redundancy, etc mentioned by @AlexApps99 when you talk about distributed DB.) – victor6510 Jul 06 '22 at 04:27

2 Answers2

4

I am reading about distributed systems and getting confused with what is really means?

As commented by @ReinhardMänner, the good general term definition of distributed system (DS) is at https://en.wikipedia.org/wiki/Distributed_computing

A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. The components interact with one another in order to achieve a common goal.

Anything that fits above definition can be referred as DS. All mentioned examples such as micro-services, distributed databases, etc. are specific applications of the concept or implementation details.

The statement "X being a distributed system" does not inherently imply any of such details and for each DS must be explicitly specified, eg. distributed database does not necessarily meaning usage of sharding.

Radoslav Bodó
  • 613
  • 5
  • 19
0

I'll also draw from Wikipedia, but I think that the second part of the quote is more important:

A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. The components interact with one another in order to achieve a common goal. Three significant challenges of distributed systems are: maintaining concurrency of components, overcoming the lack of a global clock, and managing the independent failure of components. When a component of one system fails, the entire system does not fail.

A system that constantly has to overcome these problems, even if all services are on the same node, or if they communicate via pipes/streams/files, is effectively a distributed system.

Now, trying to clear up your confusion:

  1. Horizontal scaling was there with monoliths before microservices. Horizontal scaling is basically achieved by division of compute resources.
    Division of compute requires dealing with synchronization, node failure, multiple clocks. But that is still cheaper than scaling vertically. That's where you might turn to consensus by implementing consensus in the application, or using a dedicated service e.g. Zookeeper, or abusing a DB table for that purpose.
    Monoliths present 2 problems that microservices solve: address-space dependency (i.e. someone's component may crash the whole process and thus your component) and long startup times.
    While microservices solve these problems, these problems aren't what makes them into a "distributed system". It doesn't matter if the different processes/nodes run the same software (monolith) or not (microservices), it matters that they are different processes that can't easily communicate directly (e.g. via function calls that promise not to fail).

  2. In databases, scaling horizontally is also cheaper than scaling vertically, The two components of horizontal DB scaling are division of compute - effectively, a distributed system - and division of storage - sharding - as you mentioned, e.g. A-C, D-F etc..
    Sharding of storage does not define distributed systems - a single compute node can handle multiple storage nodes. It's just that it's much more useful for a database that divides compute to also shard its storage, so you often see them together.

  3. Distributed rate limiting falls under "maintaining concurrency of components". If every node does its own rate limiting, and they don't communicate, then the system-wide rate cannot be enforced. If they wait for each other to coordinate enforcement, they aren't concurrent.
    Usually the solution is "approximate" rate limiting where components synchronize "occasionally".
    If your components can't easily (= no latency) agree on a global rate limit, that's usually because they can't easily agree on a global anything. In that case, you're effectively dealing with a distributed system, even if all components just threads in the same process.
    (that could happen e.g. if you plan to scale out but haven't done so yet, so you don't allow your threads to communicate directly.)

root
  • 5,528
  • 1
  • 7
  • 15