I am just starting out with Apache Spark in Java. I am currently doing a mini project with some books data. I have to find the most popular author in each country.
I have a pairRDD where the Key is the country and Value is the Author, like this
[(usa,C. S Lewis), (australia,Jason Shinder), (usa,Bernie S.), (usa,Bernie S.)]
Do I have to use Tuple3 to add one more field and count the number of times each value is present? If so, how do I use combineByKey for Tuple3?
I had another idea where I could take all keys from the pairRDD and based on that, I could filter to use another pairRDD with author_names
and number of times each of them is mentioned with which I could find the most popular author. But this doesn't feel like an elegant solution as I have to loop through the array of keys. Help.