Is increased CPU time (as reported by time
CLI command) indicative of inefficiency when hyperthreading is used (e.g. time spent in spinlocks or cache misses) or is it possible that the CPU time is inflated by the odd nature of HT? (e.g. real cores being busy and HT can't kick in)
I have quad-core i7, and I'm testing trivially-parallelizable part (image to palette remapping) of an OpenMP program — with no locks, no critical sections. All threads access a bit of read-only shared memory (look-up table), but write only to their own memory.
cores real CPU
1: 5.8 5.8
2: 3.7 5.9
3: 3.1 6.1
4: 2.9 6.8
5: 2.8 7.6
6: 2.7 8.2
7: 2.6 9.0
8: 2.5 9.7
I'm concerned that amount of CPU time used increases rapidly as number of cores exceeds 1 or 2.
I imagine that in an ideal scenario CPU time wouldn't increase much (same amount of work just gets distributed over multiple cores).
Does this mean there's 40% of overhead spent on parallelizing the program?