I've been working on a series of automatic load-testing scripts, and I've noticed that when averaged out, there's no difference between running a cluster of 2 processes and 4 processes on a Heroku dyno (in this case, a Hapi.js server which just immediately returns a reply), despite the dyno reporting itself as having four available CPUs. The difference between 1 and 2 processes is huge, nearly a 100% increase in throughput.
My guess is Intel CPUs / hyperthreading reporting twice as many cores as are actually available, and Node doesn't really benefit from the benefits in scheduling, but there seems to be very little information available about the specs on Heroku dynos. Is this accurate, or is there another reason performance caps out at 2 threads on a server with no I/O?