Working through their MapReduce tutorial, and Basho posits a MR challenge here, given daily stock data for the GOOG
ticker:
Find the largest day for each month in terms of dollars traded, and subsequently the largest overall day. Hint: You will need at least one each of map and reduce phases.
Each day in the goog
bucket has a key that corresponds to its data and corresponding data that looks like this:
"2010-04-21":{
Date: "2010-04-21",
Open: "556.46",
High: "560.25",
Low: "552.16",
Close: "554.30",
Volume: "2391500",
Adj Close: "554.30"
}
Due to my relative lack of familiarity with the MR paradigm (and, candidly, Javascript), I wanted to work through how to do this. I assume that most of the work here would actually get done in the reduce
function, and that you'd want a map
function that looks something like:
function(value, keyData, arg){
var data = Riak.mapValuesJson(value)[0];
var obj = {};
obj[data.Date] = Math.abs(data.Open - data.Close);
return [ obj ];
}
which would give you a list, by day, if not of dollars traded per day, at least the change in the stock price by day.
The question I would then have would be how to structure a reduce
function that is able to parse through by month, select for only the largest value per month, and then sort everything from largest month to smallest.
Am I shortchanging the work that I need to do in my map
function here, or is this roughly the right idea?