How to debug mapreduce (hadoop-2.5.1) programs using log or Eclipse-hdt?

Question

I am trying to debug map-reduce programs, and find it's quite a headache. I tried this, yet not working, because I am using eclipse hdt plugin, and don't use hadoop jar XXX command. So I tried to debug with logging.

I tried both

public static final Log LOG = LogFactory.getLog(Reduce.class);

LOG.debug("XXX");

and

System.out.println("XXX");

and according to this post on stackoverflow, it is supposed that I should find the log at $HADOOP_HOME/logs/userlogs/XXX, but I find the folder is empty. I assume it's probably because the hadoop I'm using is 2.x, yet the suggested answer uses 0.x. It's also probable that I didn't setup hadoop completely.

I also tried the accepted answer in that post. However I cannot visit http://localhost:50030/jobtracker.jsp. Don't know why neither.

Any suggestions? Besides how to debug with log, easy solutions with eclipse-hdt are also appreciated.

score 0 · Accepted Answer · answered Nov 28 '14 at 05:24

0

Using ResourceManager's UI:

If you are running YARN in the cluster, then there would be no JobTracker running in your cluster hence you could not access http://localhost:50030/jobtracker.jsp but rather there would be a ResourceManager running and you could access the ResourceManager's web page by visiting http://RESOURCE_MANAGER_HOST:8088/cluster (replace RESOURCE_MANAGER_HOST with your ResourceManager's ip address).

From the ResourceManager's web page you could access all the applications that are running and have completed in your cluster, you could click on the individual application_id that you would want to access, from there you will be able to see the link to the logs, you could also access this page for individual application_id if using http://RESOURCE_MANAGER_HOST:8088/cluster/app/APPLICATION_ID (replace APPLICATION_ID with the application id that got assigned to your mapreduce job).

Using CLI:

From command line if you knew your application_id then you could use the following command to retrieve the logs for a specific application:

yarn logs -applicationId APPLICATION_ID

Note: Replace the APPLICATION_ID with your own application id that got assigned for your mapreduce job.

Further Reading:

Also, review the following links by pivotal and by hortonworks on how to manage logs in YARN.

answered Nov 28 '14 at 05:24

Ashrith

6,745
2
29
36

I've tried `http://localhost:8088/cluster` and found the list was empty. I make sure the application has run successfully because I can see its output at my namenode DFS. By the way I just setup a simple pseudo-distributed environment locally, with a namenode configured; I don't know if it can be called as "cluster"...(So, as you can see, I'm a totally newbie) – misaka-10032 Nov 28 '14 at 06:41
Make sure your job is not being run by local job runner, which does not submit the job to resource manager. Take a look at this: [question](http://stackoverflow.com/questions/26836995/jobtracker-ui-not-showing-progress-of-hadoop-job/26838257#26838257) and this [question](http://stackoverflow.com/questions/9740999/hadoop-only-launch-local-job-by-default-why) as well. – Ashrith Nov 28 '14 at 06:56
I make sure I've specified `mapreduce.framework.name` as `yarn` following the official site. I don't know if I've conveyed my question well: I want to debug my mapreduce program, and one of the resolution is to print logs in my program, right? So if I did what I mentioned above, it is supposed that I should find these logs in `http://localhost:8088/cluster` as you told me, right? This is first what I want to confirm. – misaka-10032 Nov 28 '14 at 13:26
Second, why `http://localhost:50070/logs/userlogs/` is empty? Isn't it supposed that userlogs should be there? – misaka-10032 Nov 28 '14 at 13:27
Yeah one way to debug is to print out to logs, and as the answer suggested there are two ways to look at the logs. And `http://localhost:50070/logs` will show the NameNode logs and it does not contain userlogs, userlogs is created in directories specified by `yarn.nodemanager.log-dirs` property. – Ashrith Nov 30 '14 at 07:54
I see the logs, finally. It seems an issue of eclipse-hdt: it didn't load my configs, don't know why. I just tried to run my application in terminal, and found my userlogs. BUT, I found `stdout` is empty. Isn't it supposed that what is output by `System.out.println(...)` should be redirected to `stdout`? – misaka-10032 Nov 30 '14 at 11:16

score 0 · Answer 2 · edited May 23 '17 at 12:29

0

According to some discussions on internet, the reason is the version of hadoop 2.x and using yarn(I am using hadoop 2.6.0). The port of historyserver is 19888. So user can access it by localhost:19888 (NOT localhost:50030/jobtracker.jsp) User need to (1) config historyserver and (2) run historyserver before access http://localhost:19888.

related discussions: The most popular solution

edited May 23 '17 at 12:29

Community

1
1

answered Sep 17 '15 at 02:16

William

1
1

you forgot to add link? – HaveNoDisplayName Sep 17 '15 at 02:35

How to debug mapreduce (hadoop-2.5.1) programs using log or Eclipse-hdt?

2 Answers2