1

I'm trying to profile WebSphere using hprof over IBM stack (J9 JVM on AIX / Linux). Specifically, I'm interested in the CPU samples output from hprof, and particularly, the startup time (from the time WS is started until it is "ready for business").

The problem is, I can't get the CPU samples output in the hprof result file.

I'm using the following JVM argument for configuring hprof: -Xrunhprof:cpu=samples,file=path-to-hprof.txt, for dumping hprof output in ASCII format. According to the generated hprof output, the CPU SAMPLES output is only generated at program exit:

HEAP DUMP, SITES, CPU SAMPLES|TIME and MONITOR DUMP|TIME records are generated at program exit.

So, for shutting down WebSphere gracefully after it successfully started, I'm using the stopServer.sh script, and expecting the CPU SAMPLES output to be present in the result java.hprof.txt file after shutdown completes, but it isn't.

What am I doing wrong? Is there a better method for using hprof with WebSphere and generating CPU profiling output? Any help will be much appreciated!

Edit: I'm running WebSphere version 8.0.0.11 over IBM J9 VM (build 2.6, JRE 1.6.0 20150619_253846) on RHEL 7.5.

P.S.: I also looked for a way for closing WS from the management console GUI, but couldn't find any.

P.P.S.: In the meanwhile I'm using the very nice jvmtop tool with the --profile <pid> option, but that provides only partial insight, and as opposed to hprof, has to be attached on the fly, so some parts of the execution are lost.

valiano
  • 16,433
  • 7
  • 64
  • 79
  • Can you check if there's anything in native_stderr.log and native_stdout.log? – kgibm Nov 12 '18 at 15:47
  • Also, for J9, I recommend the sampling profiler Health Center that ships with IBM Java instead of HPROF: https://publib.boulder.ibm.com/httpserv/cookbook/Major_Tools-IBM_Java_Health_Center.html – kgibm Nov 12 '18 at 15:48
  • Thanks for your comments, @kgibm! Nothing that pops up in native_stderr.log and native_stdout.log but the standard WS outputs. – valiano Nov 12 '18 at 15:51
  • Note, I'm primarily interested in a console profiler - perhaps I should have highlighted that in my question. But surely I'll give a closer look on the IBM health center sampling profiler. – valiano Nov 12 '18 at 15:54
  • For a console profiler, on Linux, I start IBM Java with `-Xjit:perfTool` and then use Linux `perf` (e.g. `perf record -T -F 99 -a -g` and then `perf report`, or Brendan Gregg's flame graph reporter). What version of WAS are you on? I'll give HPROF a shot and see if I can reproduce. – kgibm Nov 12 '18 at 18:59
  • @kgibm sounds great! It's WAS 8.0.0.11. – valiano Nov 12 '18 at 19:37
  • I was not able to reproduce this. I ran with `-Xrunhprof:cpu=samples,file=hprof.txt`, stopped the JVM, and the file was produced in `/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/`. Maybe you were checking in the logs directory instead? The current working directory of the JVM is the profile folder. – kgibm Nov 12 '18 at 20:34
  • Thanks @kgibm. I specified `hprof.txt` with a full path, and I saw that it was created with hprof outputs (mostly thread start/stop messages), but not with the CPU sampling output. Did you stop the JVM with `stopServer.sh`? Just in case, I'll try with `hprof.txt` as well and check the `AppSrv01` profile folder. – valiano Nov 12 '18 at 20:38
  • Oh ok, if you specified an absolute path, then that makes sense. I do see the CPU sampling output (can't easily put multiple lines of output in a comment): `CPU SAMPLES BEGIN (total = 112422) Mon Nov 12 12:32:39 2018` `rank self accum count trace method` `1 22.48% 22.48% 25268 302396 com.ibm.io.async.AsyncLibrary.aio_getioev3` – kgibm Nov 12 '18 at 20:40
  • I stopped the server using the Administrative Console, but `stopServer.sh` is functionally equivalent – kgibm Nov 12 '18 at 20:41
  • At this point, I'm not sure what's going on, so I suggest you open a support ticket and that will get routed to the Java team to investigate – kgibm Nov 12 '18 at 20:41
  • 1
    Roger that, @kgibm... Thank you very much for your kind help! – valiano Nov 12 '18 at 20:43

1 Answers1

0

Thanks to @kgibm's helpful hints, I realized I was on the right track, and went back the next day to try again. Surprisingly, this time, it worked! The hprof file was generated with the expected WebSphere CPU samples output.

I kept experimenting to figure out what I got wrong in the first place. Here's what I think has happened:

  1. At first, I had a couple of native agents specified in WebSphere JVM arguments. The combination of these agents caused WS to run much slower. When I killed WS, there were a few seconds between the Server server1 stop completed message was printed and hprof.txt being completely written. I believe I was too quick to view hprof.txt, before the CPU samples output was actually written.

  2. Then, for troubleshooting this issue, I added the doe=n parameter to the hprof argument. doe stands for Dump On Exit, and defaults to y. Only later I realized that this is probably wrong, since as quoted, CPU samples output is only generated at exit.

I think that these two issues together contributed to my confusion, so when I started clean, everything was OK.

Perhaps it is worth clarifying in hprof documentation that the doe=n option is conflicting with cpu=samples, and possibly with the other options that write on exit as well (I didn't see such an indication in the docs, but it's possible I've missed it).

valiano
  • 16,433
  • 7
  • 64
  • 79