1

I am struggling something like this in hadoop

I get following as a result of my mapper

KeyValue1, 2014-02-01 20:42:00
KeyValue1, 2014-02-01 20:45:12
KeyValue1, 2014-05-01 10:35:02
KeyValue2, 2014-03-01 01:45:12
KeyValue2, 2014-03-01 02:08:18
KeyValue3, 2014-02-01 20:45:12
KeyValue4, 2015-02-01 05:45:12
KeyValue4, 2013-02-01 10:45:12

and goes on..

At the end of the day I want this;

 KeyValue1, TimeDifference(first occurrence - last occurrence)
 KeyValue2, TimeDifference(first occurrence - last occurrence)
 KeyValue3, -occured once-
 KeyValue4, TimeDifference(first occurrence - last occurrence)

Any input is highly appreciated. Cheers

Bedi Egilmez
  • 1,494
  • 1
  • 18
  • 26
  • Emit the same order to a reducer. In the reducer, for each unique key, iterate the list of values. Set 2 variables for first and last dates and update them as and when you get a bigger or smaller date. At the end of iteration, get the difference and emit. – Arun A K Dec 04 '14 at 03:29

1 Answers1

0

There are multiple approaches. I will suggest use a composite key. Create custom Partitioner, KeyComparator and GroupComparator. Then on reducer side you can simple pick 1st and last row and take a difference.

bluegeek
  • 26
  • 2