Performance implications of distribution of threads amongst processes in a server

Question

The question title is pretty awkward, sorry about that.

I am currently working on the design of a server, and a comment came up from one of my co-workers that we should use multiple processes, since the was some performance hit to having too many threads in a single process (as opposed to having that same number of threads spread over multiple processes on the same machine)

The only thing I can think of which would cause this (other than bad OS scheduling), would be from increased contention (for example on the memory allocator), but I'm not sure how much that matters.

Is this a 'best practice'? Does anyone have some benchmarks they could share with me? Of course the answer may depend on the platform (I'm interested mostly in windows/linux/osx, although I need to care about HP-UX, AIX, and Solaris to some extent)

There are of course other benefits to using a multi-process architecture, such as process isolation to limit the effect of a crash, but I'm interested about performance for this question.

For some context, the server is going to service long-running, stateful connections (so they cannot be migrated to other server processes) which send back a lot of data, and can also cause a lot of local DB processing on the server machine. It's going to use the proactor architecture in-process and be implemented in C++. The server will be expected to run for weeks/months without need of restart (although this may be implemented by rotating new instances transparently under some proxy).

Also, we will be using a multi-process architecture, my concern is more about scheduling connections to processes.

One reason to confirm the theory could be lock contention on heap memory facilities (malloc). The same could probably hold for locks related to file system access. Basically all "common infrastructure" which is used by the threads concurrently and usually no one gives a thought about. From a pure scheduling related perspective, I cannot fathom a reason why multiple processes should yield better performance. If at all I would opt for the many threads few/one process approach as then there are opportunities to optimize locality of access (cache misses etc.) — BitTickler, Apr 30 '15 at 05:24
Yeah, on the other hand you would need to balance that effect against the increased cost from process context switches potentially thrashing the processor cache more (... This is a thing, right?) — Bwmat, Apr 30 '15 at 05:29
A while back when I was porting some (embedded) code to FreeBSD I asked in BSD irc, why there are no custom heap apis. They pointed out that their malloc was optimized using thread local storage to avoid heap lock contention. I could live with that answer, kind of. — BitTickler, Apr 30 '15 at 05:31

Performance implications of distribution of threads amongst processes in a server

0 Answers0