the last reducer is very slow in MapReduce

Question

the speed of the last reduce is very slow. the other reduce the number of my map and reduce is follows the number of map is 18784, the number of reduce is 1500 the average of time for each reduce about 1'26, but the last reduce is about 2h i try to change the number of reduce and reduce the size of job. but nothing changed

the last reduce as for my partition

public int getPartition(Object key, Object value, int numPartitions) {
    // TODO Auto-generated method stub
    String keyStr = key.toString();
    int partId= String.valueOf(keyStr.hashCode()).hashCode();
    partId = Math.abs(partId % numPartitions);
    partId = Math.max(partId, 0);
    return partId;
    //return (key.hashCode() & Integer.MAX_VALUE) % numPartitions;
}

score 0 · Answer 1 · answered Aug 11 '17 at 15:52

0

I had similar experience, in my case it was due to only one reduce was doing processing all data. This happens due to data skewness. Take a look counters at reducers that is already been processed and the one that is taking lot of time, you will likely see more data is being handled by the reducer that is taking lot of time.

You might want to look into this.

Hadoop handling data skew in reducer

answered Aug 11 '17 at 15:52

user3330284

373
2
6
14

thank you. but when i reduce the data size about 10% data and change my partitioner, and i get the same result. the last reduce is also slow. – yanzhuo Aug 12 '17 at 06:31
Did you see how much data is it processing? Is it processing more data than rest of the reducers? – user3330284 Aug 14 '17 at 04:15
thank you. And I find the reason. i forgot set the class of setCombinerClass – yanzhuo Aug 15 '17 at 08:43

score 0 · Answer 2 · answered Aug 11 '17 at 16:51

Very probably you are facing skew data problem.

Or your keys are not very well distributed or your getPartition is generating the issue. Itś not clear form me why you are creating a string from the hash code of the string and then getting the hash code for this new string. My suggestion is that first try with the default partition and then look inside the distribution of your keys.

score 0 · Answer 3 · answered Aug 15 '17 at 08:50

In fact, when you process the large amount of data, you should set the class of Combiner. And if you want to changes encoding you should reset the Reduce function. for example.

 public class GramModelReducer extends Reducer<Text, LongWritable, Text, LongWritable> {

private LongWritable result = new LongWritable();
public void reduce(Text key, Iterable<LongWritable> values,Context context) throws IOException, InterruptedException {

      long sum = 0;
      for (LongWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(new Text(key.toString().getBytes("GB18030")), result);
}

}

class GramModelCombiner extends Reducer<Text, LongWritable, Text, LongWritable> {
public void reduce(Text key, Iterable<LongWritable> values,Context context) throws IOException, InterruptedException {

      long sum = 0;
      for (LongWritable val : values) {
        sum += val.get();
      }
      context.write(key, new LongWritable(sum));
}

}

the last reducer is very slow in MapReduce

3 Answers3