I am trying to profile a MapReduce job running on Azure HDInsight (HDP 2.2). All I really want is a profile of a single reduce task (though multiple would be better).
Here are the configuration settings I'm currently using:
mapreduce.task.profile=true
mapreduce.task.profile.params=-agentlib:hprof=cpu=samples,depth=100,interval=7,lineno=y,thread=y,force=n,file=d:/profile.out
First of all, it seems that in past versions of hadoop, the job client used to copy profile output files back to the location the job was submitted from, but this is no longer the case. I have to go to the task nodes and find them (thus the reason for putting them in an easier-to-find directory). Not sure if this is a bug.
But the problem is the output files have only the profile header. It's a bunch of information about what hprof is and what the file contains, and then no contents. When I run a simple java program locally with the same profile arguments, I do get actual contents.
Is there something abnormal about the yarn container environment that might prevent hprof from writing its output? Perhaps the task jvms are exiting strangely? Is there some way I can change that?