Elongated question:
When having more blocking threads then CPU cores, where's the balance between thread amount and thread block times to maximize CPU efficiency by reducing context switch overhead?
I have a wide variety of IO devices that I need to control on Windows 7, with a x64 multi-core processor: PCI devices, network devices, stuff being saved to hard drives, big chunks of data being copied,... The most common policy is: "Put a thread on it!". Several dozen threads later, this is starting to feel like a bad idea.
None of my cores are being used 100%, and there's several cores who're still idling, but there are delays showing up in the range of 10 to 100ms who cannot be explained by IO blockage or CPU intensive usage. Other processes don't seem to require resources either. I'm suspecting context switch overhead.
There's a bunch of possible solutions I have:
- Reduce threads by bundling the same IO devices: This mainly goes for the hard drive, but maybe for the network as well. If I'm saving 20MB to the hard drive in one thread, and 10MB in the other, wouldn't it be better to post it all to the same? How would this work in case of multiple hard drives?
- Reduce threads by bundling similar IO devices, and increase it's priority: Dozens of threads with increased priority are probably gonna make my user interface thread stutter. But I can bundle all that functionality together in 1 or a couple of threads and increase it's priority.
Any case studies tackling similar problems are much appreciated.