How to scale a NodeJS stateful application

Question

I am currently working on a web-based MMORPG game and would like to setup an auto-scaling strategy based on Docker and DigitalOcean droplets.

However, I am wondering how I could manage to do so:

My game server would have to be splittable across different Docker containers BUT every game server instance should act as if it was only one gigantic game server. That means that every modification happening in one (character moving) should also be mirrored in every other game server.

I am trying to get this to work (at least conceptually) but can't find a way to synchronize all my instances properly. Should I use a master only broadcasting events or is there an alternative?

I was wondering the same thing about my MySQL database: since every game server would have to read/write from/to the db, how would I make it scale properly as the game gets bigger and bigger? The best solution I could think of was to keep the database on a single server which would be very powerful.

I understand that this could be easy if all game servers didn't have to "share" their state but this is primarily thought so that I can scale quickly in case of a sudden spike of activity.

(There will be different "global" game servers like A, B, C... but each of those global game servers should be, behind the scenes, composed of 1-X docker containers running the "real" game server so that the "global" game server is only a concept)

This seems like a general scaling question. The question would be the same whether or not it was in Docker, correct? — Andy Shinn, May 15 '17 at 16:51
Right, but I think docker is almost an out-of-the-box solution when it comes to deploying easily. That's why I will probably use it to host my game servers. — Telokis, May 15 '17 at 17:39
Running another Docker container is akin to just running another process. It won't handle state or memory replication for your application. The application still needs to implement sharing that state between multiple processes over some sort of bus or protocol that would support multiple hosts. — Andy Shinn, May 15 '17 at 17:56
Sure, sure, that's just mentionned so that people could potentially come with solutions specific to Docker in case it exists. — Telokis, May 15 '17 at 17:56
this seems a bit too broad. and potentially not programming related. — Kevin B, May 17 '17 at 20:24
What would it be related to, then? Where can I ask questions and get answers without having them closed? — Telokis, May 17 '17 at 21:01
depends on the question. this one, as is, probably a consultant. — Kevin B, May 18 '17 at 02:08
Maybe I could ease it a bit by asking "What strategy to use to synchronize stateful nodejs instances?". Is it better? — Telokis, May 18 '17 at 09:23
Problems like this is why WoW has regions, and Eve has systems segmented by grids. Odds are, you will need to fracture your play areas. — Stephan, May 24 '17 at 18:04
Supporting the comment from @Stephan above, in current MMOs each "Realm" or "Server" consists of multiple (auto-) scaling servers able to serve one or multiple smaller areas, usually for player position/movement and combat. — dualed, May 26 '17 at 10:37
Looks like first you need to choose between stateful or stateless approach. Both has different horizontal scaling technique — jazst21, Feb 10 '19 at 03:01

score 6 · Accepted Answer · answered May 19 '17 at 14:39

The problem you state is too generic and it's difficult to give a concrete response. However let me be reckless and give you some general-purpose scaling advices:

Remove counters from databases. Instead primary keys that are auto-incremented IDs, try to assign random UUIDs.
Change data that must be validated against a central point by data that is self contained. For example, for authentication, instead of having the User Credentials in a DB, use JSON Web Tokens that can be verified by any host.
Use techniques such as Consistent Hashing to balance the load without need of load balancers. Of course use hashing functions that distribute well, to avoid/minimize collisions.

The above advices are basically about changing the design to migrate from stateful to stateless in as much as aspects as you can. If you anyway need to provide stateful parts, try to guess which entities will have more chance to share stateful data and allocate them in the same (or nearly server). For example, if there are cities in your game, try to allocate in the same server the users that are in the same city, since they are more willing to interact between them (and share stateful data) than users that are in different cities.

Of course if the city is too big and it's very crowded, you will probably need to partition the city in more servers to avoid overloading the server.

score 4 · Answer 2 · answered May 18 '17 at 08:49

4

Your question is too broad and a general scaling problem as others have mentioned. It'd have been helpful if you'd stated more clearly what your system requirements are.

If it has to be real-time, then you can choose Redis as your main DB but then you'd need slaves (for replication) and you would not be able to scale automatically as you go*, since Redis doesn't support that. I assume that's not a good option when you're working with games (Sudden spikes are probable)

*there seems to be some managed solutions, you need to check them out

If it can be near real-time, using Apache Kafka can prove to be useful.

There's also a highly scalable DB which has everything you need called CockroachDB (I'm a contributor, yay!) but you need to run tests to see if it meets your latency requirements.

Overall, going with a very powerful server is a bad choice, since there's a ceiling and it'd cost you more to scale vertically.

answered May 18 '17 at 08:49

d9ngle

1,303
3
13
30

What I had in mind is having nodes for my game server instances. And they would synchronize themselves using one or multiple redis instances so that the state is contained in redis and I can scale my workers more easily – Telokis May 18 '17 at 09:25
you can accomplish that using pub/sub & keys in Redis but then once your req/sec grows too large, you'll hit your DB bottleneck. Not only you need to scale your node.js instances (to serve request) but also you need to scale your backend, which is your DB! – d9ngle May 18 '17 at 09:33
Yes, that's the issue. At least there is redis cluster which seems pretty nice even though I haven't played with it. Otherwise, I could scale redis vertically while scaling my nodejs, socket.io instances horizontally. – Telokis May 18 '17 at 10:21
Redis Cluster uses hash slots so when you add a new node to the cluster, you need to re-shard, therefore it might not be the optimal solution for scaling up automatically. (in case of a spike) Also you can't perform `MULTI` on keys living on different nodes. If you need strong consistency & transactions, cockroachdb is a no-brainer! – d9ngle May 18 '17 at 11:25
Yup, this seems quite fast but how does it compare to Redis in terms of speed? Since Redis stands in RAM, it's more like a cache. Maybe I could even combine both? – Telokis May 18 '17 at 12:15
I planned on using MySQL as a persistent storage and Redis as a cache/temporary state storage which would help me synchronize all my nodes. Do you suggest using Cockroach in place of Redis or MySQL, given this setup? – Telokis May 18 '17 at 12:20
Standard answer would be CockroachDB/Redis, but all combinations make sense as long as you're aware of trade offs. You can even use Redis as your sole DB! (if you can afford to lose occasional data), CockroachDB is new, but very simple to use/persistent/distributed/highly scalable. I'm sure you can automate a lot using CockroachDB and some orchestration such as Kubernetes. – d9ngle May 18 '17 at 13:16
I may give CockroachDB a try in combination with Redis for caching. Never tried Kubernetes, though. – Telokis May 18 '17 at 13:35
You can also use Pipelining to reduce load on your Redis instances, depending on your game's logic. – d9ngle May 18 '17 at 13:37
I guess I can go with that in mind. Since the project isn't even in production yet, we will probably have to iterate a bit on our infrastructure. Thanks for all the details even though nobody likes my question! :( – Telokis May 18 '17 at 14:13

score 2 · Answer 3 · answered May 24 '17 at 19:24

There's a great benefit in scaling horizontally such an application. I'll try to write down some ideas.

Option 1 (stateful):

When planning stateful applications you need to take care about synchronisation of the state (via PubSub, Network Broadcasting or something else) and be aware that every synchronisation will take time to occur (when not blocking each operation). If this is ok for you, lets go ahead.

Let's say you have 80k operations per second on your whole cluster. That means that every process need to synchronise 80k state changes per second. This will be your bottleneck. Handling 80k changes per second is quiet a big challenge for a Node.js application (because it's single threaded and therefore blocking).

At the end you'll need to provision precisely the maximum amount of changes you want to be able to sync and perform some tests with different programming languages. The overhead of synchronising needs to be added to the general work load of the application. It could be beneficial to use some multithreaded language like C, Java/Scala or Go.

Option 2 (stateful with routing):*

In some cases it's feasible to implement a different kind of scaling. When for example your application can be broken down into areas of a map, you could start with one app replication which holds the full map and when it scales up, it shares the map in a proportional way. You'll need to implement some routing between the application servers, for example to change the state in city A of world B => call server xyz. This could be done automatically but downscaling will be a challenge.

This solution requires more care and knowledge about the application and is not as fault tolerant as option 1 but it could scale endlessly.

Option 3 (stateless):

Move the state to some other application and solve the problem elsewhere (like Redis, Etcd, ...)

How to scale a NodeJS stateful application

3 Answers3