Reducer node takes a long time to receive its records

Question

When I checked the Hadoop GUI, I found that some of the reduce tasks have reached 66.66%, and they stay there for a long time. When I checked the counters, I found that the no. of input records is shown as zero.

After a long time, they get their input records, start processing them. Some show 0 input records in even for longer times and are killed by the Task Attempt failed to report status for 600 ms.

But some of the reducers show input records in their counters immediately and start processing them right away.

I do not know, why there is so much delay in the getting the input records for some reducers. This happens only with this program, and not the other programs.

In this mapreduce job, I have, in the configure method before the reduce method of the reduce, I read a lot of data from distributed cache. Is this the reason? I am not sure.

At which stage does the reducer stuck? Could you post the last few lines of the stack trace of the TaskTracker node in question? — Lorand Bendig, Mar 16 '13 at 17:05
Try to log your configure method to see the timings. Also note that some long running mappers may cause the reducers stuck: their shuffle phase last as long as the last mapper finishes. — Lorand Bendig, Mar 16 '13 at 17:21
Thanks for the comment. I will log the configure method to see the timings. — Mahalakshmi Lakshminarayanan, Mar 17 '13 at 00:29

score 1 · Accepted Answer · answered Mar 16 '13 at 18:58

1

Yes I believe the reading from distributed cache is the reason for your delay. But it isn't going to make a difference if you keep configure() before or after the reduce() , as ultimately configure() method has to be called first, if you see the run() of the reducer it looks like follows (New API):

public void run(Context context) throws IOException, InterruptedException {

    setup(context); // This is the counterpart of configure() from older API

    while (context.nextKey()) {
        reduce(context.getCurrentKey(), context.getValues(), context);
    }
    cleanup(context);
}

As you can see setup() is called before reduce(), and similarly in older API it would be that unless configure() finishes actual reduce task won't start (and this explains you not seeing any input records count shown).

Now as for the percentage : 66%, you see that reduce phase has actually following sub-parts:

Copy
Sort
Reduce

So, since your first 2 steps were done and the third one had started but was waiting for the configure() to finish (distributed cache to be read), your reduce percentage was 66%.

answered Mar 16 '13 at 18:58

Amar

11,930
5
50
73

Thanks for the explanation! I also see that some nodes finish reading the distributed cache and start the reduce method soon, but some don't. Any reason for this? – Mahalakshmi Lakshminarayanan Mar 17 '13 at 00:18
Is it possible to report progress from the context() method using reporter.progress() ? I do not know how to initialize the reporter in configure method. I want to know this because the job is failing because the task failed to report status. – Mahalakshmi Lakshminarayanan Mar 17 '13 at 00:27
You cannot use reporter in `configure()` using old API *mapred* format, but you certainly can report progress from the `setup()` using the new *mapreduce* API. You may just use `context.progress();` – Amar Mar 18 '13 at 20:24
@Amar: are you sure you can use `context.progress()` in the new MapReduce API? I get a "cannot find symbol" error when I try to do this. – cabad Jun 14 '13 at 16:00
Yes I am certain. Check it put here : http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/mapreduce/Mapper.Context.html Basically it inherits the `progress()` from `TaskInputOutputContext`. Same is the case with reducer's Context. – Amar Jun 16 '13 at 23:31

Reducer node takes a long time to receive its records

1 Answers1