Performance Issue: A case study comparing multi-threading versus multi-processing

Question

Hardware: We utilize a 24-core (2*12 core) machine. There are 2 separate controllers for an SSD-disk and a SAS-RAID 0-disk. OS: Windows 8.1. Hyper-threading is disabled.
Software:

2.1. There is a master which fills up a work-queue for the workers and collects the results from a result-queue thereafter.

2.2. There are n-workers which retrieve work from the work queue. They write small input files to the disk and start an external process to carry out the actual computation. After the external process has finished output-files in the size of 10-15 MB need to be read in from the file-system and to be parsed accordingly. Finally, the worker places the results in the result-queue and carry on with the next item from the work queue.
The access to the file-system utilizing both of the disks is distributed evenly among the worker-processes.
Observations

4.1. From 0 - 10 workers there is an almost linear speed-up for both multi-threading and multi-processing. Increasing from 10 to 28 workers there is a reasonable but sub-linear speed-up in case of multi-processing but almost no increase in case of multi-threading.

4.2. We did extensive timings for multi-threading and found that the time for the computation stays almost constant with a negligible increase when increasing the numbers of workers. In contrast, when increasing the number of workers from 10 - 40 the time for reading the files from the disks dramatically increases and causes the cores into idling.

4.3. In the case of multi-processing, the workers seem to be rightly able to take full advantage of the two independent file-IO-channels (RAID and SSD) and out-perform multi-threading by far.

Finally the question: What is the bottleneck in case of multi-threading and how can we circumvent it?

Note 1: Avoiding file-system access altogether is not an option, since the external process is a third party software.

Note 2: I'm aware of these answers, but they don't address my question.

Update 2019 On a different machine with 18 cores and Windows 10 we observe exactly the same behavior.

Many possibilities. If you're using the thread pool (i.e. tasks, or `QueueUserWorkItem`, then the thread pool's thread management comes into play. It's going to determine how many concurrent threads you can run. That's on a per-process basis. That would make the multithreading scenario slower. Reading the files is going to slow down because disk i/o is essentially a single-threaded task. — Jim Mischel, Oct 13 '14 at 14:16

score 3 · Answer 1 · answered Oct 13 '14 at 14:15

Whether Multiprocessing has an advantage over Multithreading and vice-versa largely depends on the specific code that you are utilizing and your environment, so it is really hard to conclude what is happening exactly without seeing the actual code in question and detailed measurements (Response time, CPU, Disk, Memory performance counter values etc).

From point 4.2. and 4.3. of your analysis it looks like your CPU and IO is not properly utilized. There shouldn't be any significant difference in performance between the multiprocessing and multithreading scenarios if you are doing both properly. The CPU idling and increasing read times could indicate a thread blocking problem in your code, which could affect both scalability and performance.

Make sure you are not blocking threads on shared resources inside the same process which could affect performance in your multithreaded scenario. In addition you should leverage non-blocking Async IO when working with the queues and files to ensure max. concurrency.

You should have in mind that the optimal number of concurrent worker threads in your app is 24 (one thread per core) and going over that limit is probably not a good idea, unless measurements prove you wrong.

The CLR Thread pool uses the number of cores as the default thread pool minimum, which means that you will not have a performance penalty when your app is using <= 24 threads. However, when you schedule more than 24 concurrent jobs, the thread pool will start injecting threads in regular intervals into the thread pool to service the tasks over the Min.Limit. In .NET framework < 4.0 that was at a rate of 1 thread per 0.5 seconds. In .NET 4.0+ there is a concurrency hold-back algorithm, but it is still not optimal.

score 3 · Answer 2 · answered Oct 13 '14 at 15:06

Rather than looking for generalized guidelines, have you tried to use some profiling tool to discover where the bottlenecks are? I find that, although I often trace through the same areas of my logic to discover where my threading logic is "stuttering", the problems often vary, due to the fact that different factors impact thread performance - it's never the same story twice (er... usually).

I'd strongly suggest getting a hold of a profiling tool like dotTrace to gain a deeper level of insight, and to be able to delve deeper into your issue.

Best of luck!

Performance Issue: A case study comparing multi-threading versus multi-processing

2 Answers2