I have two RDDs, the first I'll call userVisits
that looks like this:
((123, someurl,Mon Nov 04 00:00:00 PST 2013),11.0)
and the second is allVisits:
((someurl,Mon Nov 04 00:00:00 PST 2013),1122.0)
I can do userVisits.reduceByKey(_+_)
can get the number of visits by that user. I can do allVisits and get the same. What I want to do is get a weighted average for the users dividing the users visits by the total visits for the day. I need to lookup a value in allVisits with part of the key tuple in user visits. I'm guessing it could be done with a map like this:
userVisits.reduceByKey(_+_).map( item => item._2 / allVisits.get(item._1))
I know allVisits.get(key) doesn't exist, but how could I accomplish something like that?
The alternative is getting the keys from allVisits and mapping each number of keys from userVisits then joining the two, but that seems inefficient.