36

I have an api server running Node.js that was using it's cluster module and testing looked to be pretty good. Now our IT department wants to move to using Docker containers which I am happy about but I've never actually used it other than just playing around. But I had a thought, the Node.js app runs within a single Docker process so the cluster module wouldn't really be the best as the single Docker process can be a slow point of the setup until the request is split up within that process by the cluster module.

So really a cluster of Docker containers running being able to start and stop them on the fly is more important than using Node.js' cluster module correct?

If I have a cluster of containers, would using Node.js' cluster module get me anything? The api endpoints take less than .5sec to return (usually quite a bit less).

I'm using MySQL (believe it's a single server, nothing more currently) so there shouldn't be any reason to use a data integrity solution then.

Mitchell Simoens
  • 2,516
  • 2
  • 20
  • 29
  • 1
    Im starting down this road and am curious how this worked out for you - am also curious if you're using pm2 as the commenter below mentioned or just forever – maehue Apr 22 '16 at 17:41
  • 1
    I'm using AWS ElasticBeanstalk so if a node command fails and quits the docker container stops and EB then automatically starts up another due to my scaling configs. Not using cluster module has been great thus far and our instances run about 10%-15% CPU utilization. – Mitchell Simoens Jul 20 '17 at 15:27

3 Answers3

25

What I've seen as the best solution when using Docker is to keep as fewer processes per container as possible since containers are lightweight; you don't want processes trying to use more than one CPU. So, running a cluster in the container won't add any value and might worsen latency.

Here https://medium.com/@CodeAndBiscuits/understanding-nodejs-clustering-in-docker-land-64ce2306afef#.9x6j3b8vw Chad Robinson explains the idea in general terms.

Kubernetes, Rancher, Mesos and other container management layers handle the load-balancing. They provide "scheduling" (moving those Docker container slices around different CPUs and machines to get a good usage across the cluster) and "networking" (load balancing inbound requests to those containers) layers internally.

Update

I think it's worth adding the link Why it is recommended to run only one process in a container? where people share their ideas and experiences, but chiefly from Jon there are some interesting points:

Provided that you give a single responsibility (single process, function or concern) to a container: Good idea Docker names this 'concern' ;)

  • Scaling containers horizontally is easier.
  • It can be re-used in different projects.
  • Identifying issues and troubleshooting is a breeze compared to do it in an entire application environment. Also, logging and reporting can be more accurate and detailed.
  • Upgrades/Downgrades can be gradually and fully controlled.
  • Security can be applied to specific resources and at different levels.
tiomno
  • 2,178
  • 26
  • 31
  • I'm not sure this really works at scale when combined with k8s. For example, we have a nodejs app that needs 18 processes to handle load spikes comfortably. With 1 process per container, that's 18 docker containers with their own overhead. If the 1 process hits 100% cpu usage and requests start queuing, the container will be detected as "down" via basic healthchecks when it's just cpu bound. The container will be restarted which will cause more issues. Allowing for 2 or 3 processes will make the healthchecks less likely to fail from one long-running process. – Aaron Butacov Aug 14 '19 at 08:51
  • 5
    I think the ideal is not so much one process as one concern in a nodejs web app because multiple child processes controls the number of _requests_ you can handle at once. Mixing 2 different apps would not be good practice, but multiple child processes makes your app more stable and allows for one-process to fail and be restarted without the entire container needing to be rescheduled. – Aaron Butacov Aug 14 '19 at 08:55
  • Hey @AaronHarun that makes sense. I haven't gone to k8s yet, only have tested Docker on AWS ECS and AWS Fargate. Here another discussion with the same trade-off mentioned. There isn't usually a perfect solution that suits all the problems and depending on your stack, team and environment you need to make up your mind, after testing if possible. ;) You should add another Answer here with your considerations. I'll give it a thumbs up. – tiomno Aug 15 '19 at 03:59
20

You'll have to measure to be sure, but my hunch would be running with node's cluster module would be worthwhile. It would get you more CPU utilization with the least amount of extra overhead. No extra containers to manage (start, stop, monitor). Plus the cluster workers have an efficient communication mechanism. The most reasonable evolution (don't skip steps) would seem to me:

  1. 1 container, 1 node process
  2. 1 container, several clustered node workers
  3. several containers, each with several node workers
Peter Lyons
  • 142,938
  • 30
  • 279
  • 274
  • 1
    [PM2](https://github.com/Unitech/pm2) is quite popular. It uses clustering and there already several docker images for PM2. – tsturzl Feb 16 '15 at 18:55
  • So am I right to say that the container will have access to multi-cores of the CPU? – Mitchell Simoens Feb 16 '15 at 19:58
  • yes. if docker was not multi-core compatible that would be a huge shortcoming. – Peter Lyons Feb 16 '15 at 20:06
  • awesome, fantastic. exactly what I wanted to know/confirm! Thanks for your time. – Mitchell Simoens Feb 16 '15 at 20:37
  • 7
    I wonder what everyone thinks of this article below which says the following in the context of Docker "Processes that manage and coordinate their own resources are no longer as valuable. Instead, management stacks like Kubernetes, Mesos, and Cattle have popularized the concept that these resources should be managed infrastructure-wide." and "In this type of environment, a process that attempts to use too many CPU cores can become a trouble-maker." https://medium.com/@CodeAndBiscuits/understanding-nodejs-clustering-in-docker-land-64ce2306afef – Harindaka Apr 04 '17 at 15:25
  • 4
    I've done a project that removed node.js cluster module and just scaled at the container level and it was fine. It think the concept of "attempts to use too many CPU cores" could be misleading. Even with the cluster module enabled, node will spin up a worker process per CPU core and even then the kernel assigns the CPU load. Generally I'd agree with "Probably no clusters in docker" but armed with actual measurements in a specific environment clustering could be beneficial. – Peter Lyons Apr 04 '17 at 16:02
  • Personally I would go single process, many containers, this way also you push the management outside the code, and scale easily. Yet on environments that you don't have docker or kubernetes, it is very cool to have this option in NodeJs embedded in. – Kat Lim Ruiz Apr 02 '19 at 18:54
0

I have a system with 4 logical cores with me and I ran following line on my machine as well as on docker installed on same machine.

const numCPUs = require('os').cpus().length;
console.log(numCPUs)

This lines prints 4 on my machine and 2 inside docker container. Which means if we use clustering in docker container only 2 instance would be running. So docker container doesn't see cores same as actual machine does. Also running 5 docker container with clustering mode enabled gives 10 instance of machine which ultimately be manages by kernel of OS with 4 logical cores.

So I think best approach is to use multiple docker container instance in swarm mode with node.js clustering disabled. This should give the best performance.

Prakhar Patidar
  • 124
  • 1
  • 8
  • 3
    Are you sure its not because you have default docker settings on you development computer? You can manage docker resources manually (cpu cores, memory etc). For example on windows: https://docs.docker.com/docker-for-windows/ – LagSurfer Nov 28 '20 at 08:44
  • Thanks, @LagSurfer this was my issue. – intumwa Apr 06 '22 at 18:56