1

Given a csv file, where each line contains a set of number, i want to write a map reduce program which determines the maximum number of all numbers in the file. lets say the csv file is 3,4 5,6 the script should return 6.

from mrjob.job import MRJob

class MRWordCounter(MRJob):
    def mapper(self, key, line):
        for word in line.split():
            yield word, 1

    def reducer(self, word, occurrences):
        yield word, sum(occurrences)

if __name__ == '__main__':
    MRWordCounter.run()

Now this script i found returns the occurences, but does not work if you have multiple values in each line. How could i parse all the data on the csv file and return the maximum?

UPDATE: so the input file that i tried to parse as a test is something this: 1,1,1,1
2
3
4
5
6
after changing the line.split() into line.split(",") it counted all the occurrences normally.

Kyr
  • 31
  • 5
  • 3
    The script you pasted here is the word count example. It does not "work" because it's doing something completely different. Have you made any attempt to adapt this code to your needs? – pault Mar 12 '18 at 17:33
  • i did not expect it to print my max value, but when i tried to parse a file lets say that has values of: test,test,test one two it will not return the word test 3 times but instead the word "test,test,test" once. That is my main issue here. I do not quite understand how to parse each value on its own. – Kyr Mar 12 '18 at 17:37
  • 2
    That is because you call `line.split()` with the default argument, which splits on whitespace. If you want to split on comma, you need to call `line.split(",")`. Can you edit your question to include the example input and the output? Try to create an [mcve] so others can reproduce your issue. – pault Mar 12 '18 at 17:52
  • it counts all occurrences now as i wanted to, thank you. – Kyr Mar 12 '18 at 18:04

0 Answers0