0

Working through their MapReduce tutorial, and Basho posits a MR challenge here, given daily stock data for the GOOG ticker:

Find the largest day for each month in terms of dollars traded, and subsequently the largest overall day. Hint: You will need at least one each of map and reduce phases.

Each day in the goog bucket has a key that corresponds to its data and corresponding data that looks like this:

"2010-04-21":{
Date: "2010-04-21",
Open: "556.46",
High: "560.25",
Low: "552.16",
Close: "554.30",
Volume: "2391500",
Adj Close: "554.30"
}

Due to my relative lack of familiarity with the MR paradigm (and, candidly, Javascript), I wanted to work through how to do this. I assume that most of the work here would actually get done in the reduce function, and that you'd want a map function that looks something like:

function(value, keyData, arg){
  var data = Riak.mapValuesJson(value)[0];  
  var obj = {};
  obj[data.Date] = Math.abs(data.Open - data.Close);
  return [ obj ];
}

which would give you a list, by day, if not of dollars traded per day, at least the change in the stock price by day.

The question I would then have would be how to structure a reduce function that is able to parse through by month, select for only the largest value per month, and then sort everything from largest month to smallest.

Am I shortchanging the work that I need to do in my map function here, or is this roughly the right idea?

fox
  • 15,428
  • 20
  • 55
  • 85

1 Answers1

0

I originally authored that challenge! Unless you'd like me to just give you the answer, I'll give you this hint: the key here is to think in terms of aggregate functions. How do you need to group the entries to find the maximum for each month, and then the maximum across the entire dataset?

Also, from the given data you can't know the exact amount of money exchanged in the day, but you could make a guess by multiplying the average price by the volume of shares traded.

seancribbs
  • 1,495
  • 10
  • 6