Why does hyperthreading not improve web worker performance, and why does navigator.hardwareConcurrency not take this into account?

Question

Multiple StackOverflow answers say that the optimal number of web workers to use is equal to the number of physical cores in the machine, not the number of logical cores. This matches my personal testing; my Macbook has hyperthreading enabled and reports 4 physical cores and 8 logical cores, and I get maximum performance with 4 web workers, not 8.

Why is this? If hyperthreading is supposed to allow a single CPU core to function as though it were two cores by executing two instructions in parallel on each clock cycle, then why does it seem to have no effect on additional web worker threads?

Relatedly, the navigator.hardwareConcurrency property in Javascript usually returns the number of logical cores, not physical cores. But its MDN page says that it exists specifically to provide a count of the optimal number of web workers:

The number of logical processor cores can be used to measure the number of threads which can effectively be run at once without them having to context switch. The browser may, however, choose to report a lower number of logical cores in order to represent more accurately the number of Workers that can run at once.

But as described in the first paragraph, it seems to be well-accepted among JS developers that the optimal number of web workers is more closely related to the number of physical cores than logical cores, and that the number reported by navigator.hardwareConcurrency is misleading. There are even entire utilities devoted to figuring out the number of physical cores for this purpose.

What's going on here? Why does hyperthreading seem to not function for its intended purpose when it comes to running web workers? And given that this fact about hyperthreading is well-known, why does navigator.hardwareConcurrency seem to ignore this fact and report an incorrect number?

*by executing two instructions in parallel on each clock cycle* - No, it's about keeping the execution units busy, especially when one thread stalls like cache misses and branch mispredicts, and latency bottlenecks. In efficient code without stalls and with good instruction-level parallelism, a single thread can run 4 instructions per clock cycle (or 5 or 6 on newer CPUs like Ice Lake or Zen). See [this Q&A](https://softwareengineering.stackexchange.com/questions/349972/how-does-a-single-thread-run-on-multiple-cores) for more about HT and how CPUs find parallelism with single threads. — Peter Cordes, May 03 '23 at 17:15
Or if the bottleneck for a thread is the amount of memory-level parallelism (in-flight cache misses), then two threads doing the same work might not gain overall throughput from sharing the same physical core, even if they aren't maxing out front-end or back-end execution units. Or having the cache footprints of two threads compete with each other for one physical core can hurt; that and memory bandwidth are why number-crunching / HPC workloads often don't benefit from HT. So yeah, if that's often happening with web workers, it would probably be best if browsers only reported physical cores. — Peter Cordes, May 03 '23 at 17:23

Why does hyperthreading not improve web worker performance, and why does navigator.hardwareConcurrency not take this into account?

0 Answers0