1

I have this following Reducer class

public class CompanyMinMaxReducer extends Reducer<Text, DateClosePair, Text, Text> {
   private Text rText = new Text();

public void reduce(Text key, Iterable<DateClosePair> values, Context context)
          throws IOException, InterruptedException {

int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
    LongWritable minDay = new LongWritable();
    LongWritable maxDay = new LongWritable();

for(DateClosePair val: values){
  LongWritable tempDate = val.getDate();
      DoubleWritable tempClose = val.getClose();

      if(tempDate.compareTo(maxDay) > 0){
        maxDay = tempDate;
      }else if(tempDate.compareTo(minDay) < 0){
        minDay = tempDate;
      }


      if(tempClose.get() > max){
        max = (int)tempClose.get();
      }else if(tempClose.get() < min){
        min = (int)tempClose.get();
      }
    }

String minDayFinal = "" + new SimpleDateFormat("yyyy").format(new Date(minDay.get()));
String maxDayFinal = "" + new SimpleDateFormat("yyyy").format(new Date(maxDay.get()));
    String output = minDayFinal + " - " + maxDayFinal + " MIN: " + min + " MAX: " + max;

    rText.set(output);
    context.write(key, rText);
}
}

My dataset is in the following format:

exchange, stock_symbol, date, stock_price_open,stock_price_high,stock_price_low, stock_price_close, stock_volume,stock_price_adj_close.

For example:

NASDAQ,AAPL,1970-10-22, ... 

I am asked to write a new MapReduce program that for each company provides the range of years it has been present in the stock market, and the maximum and minimum closing value obtained by the stock.

My program produces the correct output but the start date is constant for some reason:

AAON    1970 - 2002 MIN: 1 MAX: 35
AATI    1970 - 2010 MIN: 2 MAX: 15
ABCO    1970 - 2004 MIN: 14 MAX: 69
ABCW    1970 - 2007 MIN: 0 MAX: 53
ABII    1970 - 2008 MIN: 25 MAX: 78
ABIO    1970 - 1999 MIN: 0 MAX: 139
ABMC    1970 - 2004 MIN: 0 MAX: 6
ABTL    1970 - 2004 MIN: 0 MAX: 58
ACAD    1970 - 2009 MIN: 0 MAX: 17
ACAP    1970 - 2005 MIN: 15 MAX: 55
ACAT    1970 - 2009 MIN: 3 MAX: 29
ACCL    1970 - 1997 MIN: 3 MAX: 104
ACEL    1970 - 1998 MIN: 0 MAX: 10
ACET    1970 - 2004 MIN: 4 MAX: 27
ACFC    1970 - 2008 MIN: 1 MAX: 20
ACGL    1970 - 1997 MIN: 11 MAX: 80
ACLI    1970 - 2006 MIN: 2 MAX: 77
ACLS    1970 - 2001 MIN: 0 MAX: 30

The DateClosePair is a customer Writable I wrote like every example you would find on the web.

It is very odd that the min_closing price and the max_closing price are correct but the mix_date and max_date wrong.

Any thoughts?

2 Answers2

1

LongWritable minDay = new LongWritable() initializes your minimum date variable to 1970.

More precisely: unless given a specific value, LongWritable initializes its internal long to 0 as per the java language spec. When this is fed into java.util.Date, it gets interpreted as 0 milliseconds from Unix epoch: January 1, 1970, 00:00:00 UTC.

My guess is that 1970 is a lower bound on all the date values in your dataset. This would write it for every key.

I noticed you use int min = Integer.MAX_VALUE to initialize the close value. Perhaps you could use LongWritable minDay = LongWritable(Long.MAX_VALUE) instead to resolve?

  • I have thought about that solution. When I change the `LongWritable` to be instantiated with `Long.MAX_VALUE/MIN_VALUE`, I get the same value for the min_day and the max_day. So the output looks like this: `WEBM 2001 - 2001 MIN: 0 MAX: 72` `WEDC 2008 - 2008 MIN: 0 MAX: 18` `WEST 2003 - 2003 MIN: 0 MAX: 20` `WFD 2005 - 2005 MIN: 8 MAX: 35` – GoldenBoyLDN Oct 22 '17 at 10:10
  • Interesting. Is your aliasing issue then related to [the reuse of key/value objects](https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/mapred/Reducer.html#reduce(K2,%20java.util.Iterator,%20org.apache.hadoop.mapred.OutputCollector,%20org.apache.hadoop.mapred.Reporter)) in the reduce function? It seems frustrating that (to my knowledge) only the 2.8.0 version of Apache docs includes this fact even though prior versions implemented it also. – benjaminedwardwebb Oct 22 '17 at 17:00
  • It looks like the reuse of the Writable objects caused the aliasing issue when I was aliasing them instead of passing values. I am new to Hadoop so I cannot think of any deeper reason, though the use of the setter method seems to solved my bug. – GoldenBoyLDN Oct 22 '17 at 17:28
0

I have resolved the problem which comes to be caused by aliasing.

Instead of doing maxDay = tempDate; where now maxDay is pointing to the tempDate object, I should call the method .set().

Solution:

maxDay.set(tempDate.get());