2

I am having trouble understanding how to iterate over values. I have a mapper which will pass in something like:

(cat, *): 5
(cat, *): 5
(cat, dog): 1
(pigeon, dog): 1
(hello, world): 1
(cat, dog): 1
(pigeon, dog): 1
(hello, world): 1

I am trying to find the total value of any key/value pair with * in it so that I can use this for some statistical analysis. I am then trying to get the sum of the count for the other key value pairs as a variable to divide this with the '*' total value.

def reducer(self, pair, counts):

From the line of code above, how would I iterate over?

From the mapper I yielded as: (item, neighbour), 1 or (item, '*'), 1.

I understand that the items are generator objects and so I have to iterate in a for loop to actually do anything.

EDIT: Data is read from a text file, it is then passed through in the mapper as:

yield(word1, word2): 1

Expected output:

(cat, dog): 0.33333333

Calculated as the number of cat, dog word pairs divided by the total of number of pairs (cat, *).

For a little more clarity, I am trying to achieve what the answer in here has suggested.

Community
  • 1
  • 1
trixie
  • 33
  • 6
  • Plz update your question with some solid data structure example. Are you working with a list, file etc? – Martin Konecny Jan 09 '16 at 22:48
  • @trixie It would help, if you give an example. For e.g. (cat, *) has 2 records and (cat, dog) has 2 records. For these records, what is the expected output? – Manjunath Ballur Jan 10 '16 at 16:07
  • (cat, *) has 10 records for example, I wish to record this in a variable. Then for every other wordpair I wish to take the value for example (Cat, pigeon), 3 and divide the value by 10 (from the cat variable) this way I can find the conditional probability so expected output would be something like: (Cat, pigeon) 0.033 – trixie Jan 10 '16 at 16:31

0 Answers0