0

Im stuck with a simple problem in mrjob mareduce framework: I want to get the average number of words in a given parragraph and i got this:

class LineAverage(MRJob):

def mapper(self, _, line):
    numwords = len(line.split())
    yield "words", numwords
    yield "lines", 1


def reducer(self, key, values):
    yield key, sum(values)

With this code, i get after reduce process, the total of lines and words in the text, but i dont know how to get the average by doing:

words/TotalOfLines

I am newbie in this model of programming, if anyone can illustrate this example it'll be very appreciated.

In the meantime, thank you so much for your attention and participation

Dade
  • 33
  • 1
  • 8

2 Answers2

1

After all, the answer was simple: I actually sended to the reducer a number of values equal to the number of lines. So, in the reducer i just had to count the numer of values for the key.

class LineAverage(MRJob):

def mapper(self, _, line):
    numwords = len(line.split())
    yield "words", numwords


def reducer(self, key, values):
    i,totalL,totalW=0,0,0
    for i in values:
        totalL += 1
        totalW += i     
    yield "avg", totalW/float(totalL)

So the mapper sends for each line a pair ("words", x), the shuffle process will result in a tuple: ("words": x1, x2, x3,..xnumberOfLines) whic is the input for the reducer, then i just have to count the numbber of values for the key and thats it, i got the numer of lines.

Hope it will be helpfull for someone.

Dade
  • 33
  • 1
  • 8
0

In you reducer, you already output your key, sum(values) to the output files. You just need to read the output files into a Java/Scala program and calculate the average.

Cheng Chen
  • 241
  • 3
  • 17
  • Thanks for the answer @Cheng, i got it later. I´ll put the answer below – Dade Jun 24 '15 at 15:27
  • well, you're right. But, in this case, im working on basic exercises of map reduce. I know, in the future, i will use the hadoop framework, with multiple mappers and reducers, and combiners and more advanced processes, but for now, the goal is to understand the map reduce logic. Ty for your willingness. – Dade Jun 25 '15 at 22:36