I'am facing a scalability issue on multicore system. My application is processing scientific data in parallel on a 4 physical core machine, 8 logical cores with hyperthreading activated. We launch 8 JVM, one per logical core (we'll probably switch to one JVM eventually to avoid JVM's overhead)
The issue is that the scalability is nearly linear up to 4 cores, but then we barely gain 10-20% performance by adding 4 more "logical cores".
I analysed threads behaviour by profiling the app and I see no locks or threads that are waiting too much. I also checked with pidstat and I don't see for instance excessive context switch overhead. More precisely there is almost not context switch on the java processes. CPU usage is super high reaching almost 100% which seems also ok.
My question is how to detect and analyse the cause of this bad scalability after exceeding the number of physical cores. Which tools and methods can I use to detect where is the contention, where should I look at and can I fix it somehow without changing to much the architecture of the application (for instance switching to one JVM per machine)
Thanks