I am implementing an OpenMP multithreaded program on following machine.
x86_64, On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
It is a multithreaded clustering program. It shows expected speedup for dataset size upto 2 mil rows
~ 250 MB data
but while testing on larger dataset, many of the threads in htop
shows D state
and CPU% substantially less than 99-100%
. Note that for datasize upto this size, every thread runs in R state CPU% ~100%
. The running time becomes ~100
times more than sequential case.
Free memory seems to be available and swp
memory is 0 for all cases.
Regarding data structures used, there are 3 shared data structures size O(n) and then each thread is creating its private linked list that is stored for merging step further. I suspected its because of the extra memory utilised by this per thread data structure, but even if I comment it out program shows the same problem. Please let me know if I should provide more details.
I have only picked up OpenMP and parallel computing few months back so please let me know what can be the possible problems?