I've been trying to load a JSON data file into mrjob
, but can't really get it to work.
from mrjob.job import MRJob
from mrjob.protocol import JSONProtocol
def type_hashing(entry):
return entry[13].lower()
class ReduceData(MRJob):
INPUT_PROTOCOL = JSONProtocol
#def mapper_init(self):
# for entry in file['data']:
# yield 'entry', entry
def mapper(self, _, line):
print line
yield type_hashing(line), 1
def reducer(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
ReduceData.run()
I need the mapreduce to work only over the values in the data
attribute of the JSON file, but I am not sure how to get there.
I run the script in the terminal as:
python my_mapreduce.py jsonfile.json