3

I've been trying to load a JSON data file into mrjob, but can't really get it to work.

from mrjob.job import MRJob
from mrjob.protocol import JSONProtocol


def type_hashing(entry):
    return entry[13].lower()

class ReduceData(MRJob):
    INPUT_PROTOCOL = JSONProtocol

    #def mapper_init(self):
    #   for entry in file['data']:
    #        yield 'entry', entry

    def mapper(self, _, line):
        print line
        yield type_hashing(line), 1

    def reducer(self, key, values):
        yield key, sum(values)


if __name__ == '__main__':
    ReduceData.run()

I need the mapreduce to work only over the values in the data attribute of the JSON file, but I am not sure how to get there.

I run the script in the terminal as:

python my_mapreduce.py jsonfile.json
Syspect
  • 921
  • 7
  • 22
  • 50

0 Answers0