I'm trying to write a map-reduce function in python.
I have a file that contains product information and I want to count the number of products that are members of the same category and have the same version. like this:<category, {count, version} >
My file information is as follows:
product_name rate category id version
a "3.0" cat1 1 1
b "2.0" cat1 2 1
c "4.0" cat1 3 4
d "1.0" cat2 3 2
. . . . .
. . . . .
. . . . .
for example :
<cat1, {2, 1} >
I wrote this code but in combiner function I don't know how can I count them.
from mrjob.job import MRJob
from mrjob.step import MRStep
class MRFrequencyCount(MRJob):
def steps(self):
return [
MRStep(
mapper=self.mapper_extract_words,
combiner=self.combine_word_counts,
)
]
def mapper_extract(self, _, line):
(product_name, rate, category, id, version) = line.split('*')
yield category, (1, version)
def combine_counts(self, category, countAndVersion):
yield category, sum(countAndVersion)
if __name__ == '__main__':
MRFrequencyCount.run()