3

I am trying to make a pub/sub infra using faye (nodejs). I wish to know whether horizontal scaling would be possible or not. One nodejs process will run on single core, so when people are talking about clustering, they talk about creating multiple processes on the same machine, sharing a port, and sharing data through redis. Like this:
http://www.davidado.com/2013/12/18/using-node-js-cluster-with-socket-io-for-push-notifications/

Firstly, I don't understand how we make sure that each of the forked processes goes to a different core. If I fork 10 node servers on a machine with 4 cores, is it taken care that they are equally distributed?

What if I wish to add is a new machine, and thus scale it. I have not seen any such support anywhere. I am not sure if it is even possible to do it. Let's say somehow multiple nodes are being used and there is some load balancer. But one client will connect to only one server process. So when a client C1 publishes on a channel on which a client C2 has subscribed, and C1 is connected to process P1 and C2 is connected to process P2, how will P1 publish the message to C2 when it doesn't have the connection?

This would probably be possible in case of a single machine, because the cluster module enables all processes to share the same port and the connections too.

I am fairly new to the web world, as well as nodejs and faye. Please enlighten me if there is something wrong in the question.

neeraj
  • 1,191
  • 4
  • 19
  • 47

1 Answers1

0

You are correct in thinking that the cluster module allows multiple cores to be used on a single machine. The cluster module allows the same application to be spawned multiple times whilst listening to the same port. The distribution amongst the cores is down to the operating system, so if you have 10 processes and 4 cores then the OS will figure out how best to distribute them (as long as they haven't been spawned with a set affinity). By default this shouldn't be a concern for you.

Load-balancing can be done through node too but that is separate from clustering. Instead you would have a separate application that would grab the load statistics on each running server and proxy the http request to the most appropriate server (using http-proxy as an example). A very primitive load balancer will send one request to each running server instance incrementally to give an even distribution.

The final point about sharing messages between all the instances assumes that there is a single point where all the messages are held. In the article you linked to they assume that there is only one server and all the processes share access to the redis instance. As they all access the same redis instance, all processes will be able to receive the same messages. If we're going to start thinking about multiple servers that are in different locations in the world that all have different message stores (i.e. their own redis instances) then we get into the domain of 'replication'. Some data stores are built with this in mind and redis is one of them. You end up with a 'master' set of data and a set of 'slaves' that will periodically update with the master and grab anything they are missing. It is important to note here that messages will not be sent in 'real-time' here unless you have a very intensive replication process.

In conclusion, developers go through this chain of scaling for their applications. The first is to make the application multi-process (the cluster module). The second is to have a load balancer that proxies the http request to the appropriate server that is running the multi-process application. The third is to replicate the datastores so that the servers can run independently but keep in sync with each other.

DF_
  • 3,743
  • 25
  • 34