2

Let's say I have Hello-Service. In Lagom, this service can run across multiple nodes of a single cluster.

So within Cluster 1, we can have multiple "copies" of Hello-Service:

Cluster1: Hello-Service-1, Hello-Service-2, Hello-Service-3

But is it possible to run service Hello-Service across multiple clusters?

Like this:

Cluster1: Hello-Service-1, Hello-Service-2, Hello-Service-3,
Cluster2: Hello-Service-4, Hello-Service-5, Hello-Service-6

What I want to achieve is better scalability of the read-side processors and event consumers:

In Lagom, we need to set up front the number of shards of given event tag within the cluster.

So I wonder if I can just add another cluster to distribute the load across them.

And, of course, I'd like to shard persistent entities by some key.

(Let's say that I'm building a multi-tenant application, I would shard entities by organization id, so all entities of some set of organizations would go into Cluster 1, and entities of another set of organizations would go into Cluster 2, so I can have sharded read side processors per each cluster which handle only subset of events/entities within the cluster (for better scalability)).

With a single cluster approach, as a system grows, a sharded processor within a single cluster may become slower and slower because it needs to handle more and more events.

So as the system grows, I would just add a new cluster (Let's say, Cluster 2, then Cluster 3, which would handle their own subset of events/entities)

Teimuraz
  • 8,795
  • 5
  • 35
  • 62

1 Answers1

1

If you are using sharded read sides, Lagom will distribute the processing of the shards across all the nodes in the cluster. So, if you have 10 shards, and 6 nodes in 1 cluster, then each node will process between 1-2 shards. If you try to deploy two clusters, 3 nodes each, then you'll end up each node processing 3-4 shards, but every event will be processed twice, once in each cluster. That's not helping scalability, that's doing twice as much work as needs to be done. So I don't see why you would want two clusters, just have one cluster, and the Lagom will distribute the shards evenly across it.

If you are not using sharded read sides, then it doesn't matter how many nodes you have in your cluster, all events will be processed by one node. If you deploy a second cluster, it won't share the load, it will also process the same events, so you'll get double processing of each event by each cluster, which is not what you want.

So, just use sharded read sides, and let Lagom distribute the work across your single cluster for you, that's what it's designed to do.

James Roper
  • 12,695
  • 46
  • 45
  • What I want to achieve is 2 clusters processing their own subset of events(no same events processing twice). Cluster1 would have events of organizations from 1 to 1000, in Cluster2-events of organizations from 1001 to 2000. Cluster1 would have sharded read side processor with numShards=20, which only processes events from Cluster1, and Cluster2 would have its own sharded read side processor which only handles events from Cluster2. Thus I can achieve better scalability, because with a single cluster, as a system grows, a sharded processor within a single cluster may become slower and slower. – Teimuraz Jun 03 '19 at 12:18
  • So as the system grows, I would just add a new cluster (Let's say, Cluster3 with events of organizations from org2001 to org3000) – Teimuraz Jun 03 '19 at 12:33
  • 1
    You can achieve what you're talking about with cluster roles (https://doc.akka.io/docs/akka/current/cluster-usage.html#node-roles), by configuring the read side processors for each "shard" of organisations to only run on a single role. I think it can be achieved with Lagom, but would probably be easier using Akka persistence directly. – James Roper Jun 13 '19 at 04:31