Multiple process vs Multiple threads for CPU intensive and IO intensive processes on server

Question

I have a server with say, 16 cores and 32G memory. For a process like apache, which spawns a new thread for every new connection which of the following alternatives is better, and why? Also what will happen in the case of an application like cassandra. In case of cassandra where there are a lot of writes to the memory, this will mean having two 'nodes' on the same machine will be beneficial in any way?

Multiple (say, two) instances of the same application running on the same machine and serving on two different ports. (May be an LB on a different machine in front of this machine).

I'm confused how will the OS handle two instances of a multithreading application. Will both the processes will have threads running on all cores? In what cases will the context switching occur (between processes and threads) and how will it effect the performance?
A single instance of a multithreading application serving on one port.

In case of an application like cassandra where the threads are going to have shared memory to which they'll write, when will the context switch occur between threads?

score 2 · Answer 1 · answered Oct 03 '12 at 12:06

In a Windows context (and AFAIK also on Unix) a process is merely a structural context (and some memory protections are in place too) around an thread of execution, which means that the thing executing code is just a thread.

Processes cannot share memory with each other as easily as threads can within the same process.

But it's always a thread that does the code execution.

Now, two instances of your application running on the same machine, multithreaded will use the CPU cores available and will have to share these cores among them. If you have more cores than you have a total number of threads in your applications, then you're in luck, because that mean that it could have all threads running without ever needing to context switch to make place for another. That's in theory, however. In practice, the OS has to share time for a particular thread running on a core with other threads (maybe not even those of your application) and so each thread has a certain time slice (quantum) that it may run for before being switched out.

The OS thread scheduler is in control of this.

So, the performance is dependent on how many threads are running and how many cores are available and what these threads are doing. Assuming they can just run, once on a core, then things could be fast. But that is rarely the case and threads may need to wait, block etc.

Running two instances or one instance multi threaded will only make a real difference I think when you have many more threads than cores in the case where you run two instances.

There's also the factor of IO, which isn't dependent on your CPU's or threads, but on your Hard Disk latency and RAM latency. If a lot of your threads spend most of their time waiting on IO, then running one or two instances of your application, won't make much difference.

However, this is performance and threading, which unless measured is very hard to give much accurate predictions about.

Evgeny Lazin · Answer 2 · 2012-10-03T12:21:11.867

Multiple instances of the same application running on the same machine require inter-process synchronization. If amount of inter-process synchronization is low, this approach can be beneficial. Also, if your application is multithreaded by itself then you need only one process. If your application is single-threaded, than you may want to run several instances, for example one process per CPU, to utilize hardware resources.

If your process is IO bounded, than throughput is not limited by the CPU and one single thread can serve all IO requests.

In case of an application like cassandra where the threads are going to have shared memory to which they'll write, when will the context switch occur between threads?

Context switch occur when two or more threads tries to synchronize they writes to shared memory.

Inter process synch is only needed if these processes need to communicate with each other. — Tony The Lion, Oct 03 '12 at 12:06

Multiple process vs Multiple threads for CPU intensive and IO intensive processes on server

2 Answers2