0

I am a beginner with MrJob and having trouble calculating an average prime number from a text file of prime numbers. I am unsure at which part to apply arithmetic logic and also whether I should yield lists when using MrJob. The text file contains one million primes and this is what I've come up so far, I don't understand what the key value should be in my case.

%%writefile prime_average.py
from mrjob.job import MRJob

class primeAverages(MRJob):

def mapper(self, _, line):
    results = []
    for x in line.split():
        if(x.isdigit()):
            yield x, 1

def reducer(self, word, key):
    yield word, sum(word)/len(key)
Eckersley
  • 79
  • 9
  • How your input file is organized? Is each prime in a separate line? – Dandelion Jan 09 '19 at 03:31
  • In the mapper you have to yield the prime as value and with a common key like 1 which you are yielding as value, because values corresponding to the same key are passed to the same reducer and you want them all in one reducer! Although it seems to me that using MRJob this way has no advantage except learning MRJob basics! – Dandelion Jan 09 '19 at 03:51

1 Answers1

0

you can use something like:

def mapper(self, _, line):
    if line.isdigit():
        yield (None, int(line))

def reducer(self, key, values):
    s = 0 #sum of primes
    c = 0 #number of primes
    for p in values:
        s += p
        c += 1
    yield (None, s / c)
Dandelion
  • 744
  • 2
  • 13
  • 34