Debugging Hadoop map-reduce jobs is a pain. I can print out to stdout, but these logs show up on all of the different machines on which the MR job was run. I can go to the jobtracker, find my job, and click on each individual mapper to get to its task log, but this is extremely cumbersome when you have 20+ mapper/reducers.
I was thinking that I might have to write a script that would scape through the job tracker to figure out what machine each of the mappers/reducers ran on and then scp the logs back to one central location where they could be cat'ed together. Before I waste my time doing this, does someone know of a better way to get one, consolidated stdout log for a job's mappers and reducers?