4

Hi I'm trying to test my JAVA app on Solaris Sparc and I'm getting some weird behavior. I'm not looking for flame wars. I just curious to know what is is happening or what is wrong...

I'm running the same JAR on Intel and on the T1000 and while on the Windows machine I'm able to get 100% (Performance monitor) cpu utilisation on the Solaris machine I can only get 25% (prstat)

The application is a custom server app I wrote that uses netty as the network framework.

On the Windows machine I'm able to reach just above 200 requests/responses a second including full business logic and access to outside 3rd parties while on the Solaris machine I get about 150 requests/responses at only 25% CPU

One could only imagine how many more requests/responses I could get out of the Sparc if I can make it uses full power.

The servers are...

Windows 2003 SP2 x64bit, 8GB, 2.39Ghz Intel 4 core Solaris 10.5 64bit, 8GB, 1Ghz 6 core

Both using jdk 1.6u21 respectively.

Any ideas?

user432024
  • 4,392
  • 8
  • 49
  • 85

3 Answers3

2

The T1000 uses a multi-core CPU, which means that the CPU can run multiple threads simultaneously. If the CPU is at 100% utilization, it means that all cores are running at 100%. If your application uses less threads than the number of cores, then your application cannot use all the cores, and therefore cannot use 100% of the CPU.

Erick Robertson
  • 32,125
  • 13
  • 69
  • 98
  • His post seems to indicate that the Windows server has a 4-core CPU and the Solaris server has a 6-core CPU. If that's the case, the application must be capable of utilizing at least four cores. – Chris Shouts Aug 26 '10 at 16:01
  • The T1000 has hyperthreading, too. It claims to be able to handle perhaps 8 tasks per core if I'm reading right. This being the case, if his application used 12 threads, it would run the Windows server at 100% and the Solaris at 25%. – Erick Robertson Aug 26 '10 at 16:06
  • Yes the Windows machine is an Intel 4 core – user432024 Aug 26 '10 at 19:04
  • Just something that stood out. You commented in the-alchemist's answer that you configure 2 * cpu cores for "worker/selector" threads? Try changing that to 8 * cpu cores. The T1000 docs I read said that it could handle 8 tasks per core, and this strangely comes out to the 25% CPU usage that you reported. – Erick Robertson Aug 26 '10 at 19:27
  • It depends on which threads are bottlenecking the application. If those handler threads all require one of the worker/selector threads to manage them, you could very likely see this behavior. Make sure that the worker/selector threads are properly handing the work off to the handler and then getting another request to hand off to another handler, etc. Make sure this isn't blocking until the handler thread finishes its job. – Erick Robertson Aug 27 '10 at 12:21
  • Well thats netty so I'll give it befit of the doubt. And if it was the case then Windows wouldn't go to 100%. So I tried 48 worker/selector threads and same result. – user432024 Aug 27 '10 at 15:14
1

Without any code, it's hard to help out. Some ideas:

  • Profile the Java app on both systems, and see where the difference is. You might be surprised. Because the T1 CPU lacks out-of-order execution, you might see performance lacking in strange areas.
  • As Erick Robertson says, try bumping up the number of threads to the number of virtual cores reported via prstat, NOT the number of regular cores. The T1000 uses UltraSparc T1 processors, which make heavy use of thread-level parallelism.

Also, note that you're using the latest-gen Intel processors and old Sun ones. I highly recommend reading Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems and Maximizing Application Performance on Chip Multithreading (CMT) Architectures, both by Sun.

The Alchemist
  • 3,397
  • 21
  • 22
  • At idle state the application uses 37 threads. In working state it uses 337. Basically, Netty configures 2 * cpu cores for "worker/selector" threads and I have it configured to use 300 "handler" threads So that is 8 threads + 300 + what ever else JVM and hibernate and what ever are using... – user432024 Aug 26 '10 at 19:08
  • Hmm... in that case, I would highly recommend profiling to see where the bottlenecks are. If Netty is the bottleneck, you'll just have to play with different parameters to eek a little performance out. If the bottleneck is in your code, you'll have a lot more flexibility. Keep in mind that CPU usage in `prstat` doesn't take into account IO wait time, which in web apps can be pretty substantial. In short, I recommend profiling and seeing what's taking up so much time. – The Alchemist Aug 27 '10 at 12:35
  • Anyways I looked at the documents provided above. So far I can't really tell based alone on the descriptions. As when I check the stats it seems that all cpu threads are being used. Anyways the best would be to talk to an Oracle engineer, but since this is the only machine I have and was purchased through a 3rd party vendor I have no support package. So I guess I leave it a that for now. – user432024 Sep 02 '10 at 21:00
  • @user432024: Sorry to hear about that. Having developed multi-threaded software for the T2 chips, I can say that performance tuning can be a bit difficult. It's a CMT, no out-of-order execution chip. Taking advantage of all those cores can be difficult, but well worth it in the end. Once you get all cores blasting away at 100%, it flies. – The Alchemist Sep 03 '10 at 02:18
1

This is quite an old question now, but we ran across similar issues.

An important fact to notice is that SUN T1000 is based on UltraSpac T1 processor which only have 1 single FPU for 8 cores. So if you application does a lot or even some Float-Point calculation, then this might become an issue, as the FPU will become the bottleneck.

tranber
  • 11
  • 1
  • Uses SSL to connect externally and also uses certificates to validate XML messages. – user432024 Nov 23 '10 at 20:12
  • If I get this article right (http://en.wikipedia.org/wiki/SPARC_T4) the processor you have has only one FPU and one cryptographic thingy for all the cores. If you use certficates/SSL a lot thi might be a bottleneck. – Jens Schauder May 14 '12 at 13:52