1

I have an ouput from my mapper:

Mapper: KEY, VALUE(Timestamp, someOtherAttrbibutes)

My Reducer does recieve:

Reducer: KEY, Iterable<VALUE(Timestamp, someOtherAttrbibutes)>

I want Iterable<VALUE(Timestamp, someOtherAttrbibutes)> to ordered by Timestamp attribute. Is there any possibility to implement it?

I would like to avoid manual sorting inside Reducer code. http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/

I'll have to "deep-copy" all objects from Iterable and it can cause huge memory overhead. :(((

Capacytron
  • 3,425
  • 6
  • 47
  • 80

2 Answers2

6

It's relatively easy, you need to write comparator class for your VALUE class.

Take a closer look here: http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/ especially at A solution for secondary sorting part.

wlk
  • 5,695
  • 6
  • 54
  • 72
  • I've read this articles, pretty the same is described in Hadoop, the defenitive guide 3. As I understood, I have to move my Timestamp attribute to key and make key composite: [EXISTING_KEY_VALUE, Timestamp_attr_from_value]. If so then I don't like this approach. As for me it's not natural to my business task and can confuse other developers... :( – Capacytron Jan 14 '13 at 14:46
  • I've read it. It's not the thing I need:( The problem is that I need to get ALL values for one unique KEY and these values should be sorted by Timestamp. If I move Timestamp to KEY, I'll get ALL values with unique key (OLD_KEY_TIMESTAMP). It's not correct. – Capacytron Jan 16 '13 at 17:27
  • @Sergey I noticed that you selected this answer as the correct answer. Did it end up working for you? – Matthew Moisen Jan 23 '14 at 21:32
  • secondary Hi, yes,secondary sort is the right solution for the problem. – Capacytron Jan 24 '14 at 04:11
  • @Matthew Moisen, yes. Secondary sort clearly explained in listed acrticles + I suggest to read Tom White Hadoop 3rd edition. But the best way is to use apache pig. All these complicated low-level stuff has been implemented there long time ago. – Capacytron Jan 26 '14 at 14:18
  • I don't think Map Reduce will sort the value List for a single key... So you have to make a custom variable which contains both key and value, and also define a custom comparator which tells map reduce how to sort your key... – Jing He Aug 20 '17 at 06:37
-1

you need to write comparator class for your VALUE class.

@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
    final SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    sdf.setTimeZone(TimeZone.getTimeZone("UTC"));
    List<String> list = new ArrayList<String>();
    for (Text val : values) {
        list.add(val.toString());

    }
    Collections.sort(list, new Comparator<String>() {
       public int compare(String s1, String s2) {
           String str1[] = s1.split(",");
           String str2[] = s2.split(",");
          int time1 = 0;
           int time2 = 0;
           try {
               time1 = (int)(sdf.parse(str1[0]).getTime());
               time2 = (int) (sdf.parse(str2[0]).getTime());

           } catch (ParseException e) {
               e.printStackTrace();
           } finally {
               return time1 - time2;
           }
       }
    });
    for(int i = 0; i < list.size(); ++i)
    context.write(key, new Text(list.get(i)));
}
Victor
  • 761
  • 8
  • 7