How can I monitor Hadoop in pseudodistributed mode using JVisualVM?

Question

I'm running Hadoop in pseudodistributed mode for testing on my local machine. I'd like to monitor my mappers' and reducers' memory and CPU usage in JVisualVM. However, in JVisualVM's list of local applications, I only see org.apache.hadoop.util.RunJar.

Are the mappers and reducers running as separate processes? (In top, it looks like they are: two processes named "java" are using 100% CPU while my two mappers run.) If they are separate processes, why doesn't JVisualVM list them as applications that I can monitor?
Are the mappers and reducers contained within the org.apache.hadoop.util.RunJar process? If so, (a) why do I only see Tool and ToolRunner in the JVisualVM Sampler, not any mapper/reducer code, and (b) why does JVisualVM report nearly 0% CPU when top reports 100%?

Is there some way I can modify my mappers/reducers so that JVisualVM can see them, at least while debugging in pseudodistributed mode?

For completeness, I should say that I'm running Hadoop 0.20 from Cloudera. (It was installed on Ubuntu using apt-get install hadoop-0.20-conf-pseudo from the http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh precise-cdh4 contrib repository. Even though Cloudera puts 2.x in the version number, it's not YARN, it's the original Hadoop.)

% hadoop version
Hadoop 2.0.0-cdh4.4.0
Subversion file:///var/lib/jenkins/workspace/generic-package-ubuntu64-12-04/CDH4.4.0-Packaging-Hadoop-2013-09-03_18-48-35/hadoop-2.0.0+1475-1.cdh4.4.0.p0.23~precise/src/hadoop-common-project/hadoop-common -r c0eba6cd38c984557e96a16ccd7356b7de835e79
Compiled by jenkins on Tue Sep  3 19:33:54 PDT 2013
From source with checksum ac7e170aa709b3ace13dc5f775487180
This command was run using /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.4.0.jar

score 1 · Accepted Answer · answered Sep 29 '13 at 05:52

1

When you use hadoop jar [your_args] to start your application, actually the real command is java -jar org.apache.hadoop.util.RunJar [your_args]. So your driver which is used to start the MapReduce job is running in the process RunJar.

By default mappers and reducers run as separate processes. You can not see it in JVisualVM is because JVisualVM does not have the correct permission. Mappers and reducers are launched under the user mapred. So if you want to use JVisualVM, you need to use sudo -E -u mapred jvisualvm.

answered Sep 29 '13 at 05:52

zsxwing

20,270
4
37
59

This makes perfect sense (JVisualVM won't let me see other users' VMs). The `sudo -E -u mapred jvisualvm` didn't work for me, the way that my system is set up. But the following works. (1) As myself: `xhost +` (2) switch to user mapred with: `su mapred` (3) as user mapred: `/full/path/to/jvisualvm` – Jim Pivarski Sep 30 '13 at 15:15

How can I monitor Hadoop in pseudodistributed mode using JVisualVM?

1 Answers1