I am having trouble understanding how to iterate over values. I have a mapper which will pass in something like:
(cat, *): 5
(cat, *): 5
(cat, dog): 1
(pigeon, dog): 1
(hello, world): 1
(cat, dog): 1
(pigeon, dog): 1
(hello, world): 1
I am trying to find the total value of any key/value pair with * in it so that I can use this for some statistical analysis. I am then trying to get the sum of the count for the other key value pairs as a variable to divide this with the '*' total value.
def reducer(self, pair, counts):
From the line of code above, how would I iterate over?
From the mapper I yielded as: (item, neighbour), 1
or (item, '*'), 1
.
I understand that the items are generator objects and so I have to iterate in a for loop to actually do anything.
EDIT: Data is read from a text file, it is then passed through in the mapper as:
yield(word1, word2): 1
Expected output:
(cat, dog): 0.33333333
Calculated as the number of cat, dog word pairs divided by the total of number of pairs (cat, *).
For a little more clarity, I am trying to achieve what the answer in here has suggested.