0

I have a JSON file contains fields such as machine_id, category, and ... Category contains states of machines such as "alarm", "failure". I simply like to see how many times each machine_id has been reported using rmr2. For example, if I have the following:

machine_id, state
48, alarm
39, failure
48, utilization

I like to see this result:

48,2
39,1

What I did: I wrote a simple mapreduce to read the value of JSON file and used it as an input in the second mapreduce. Code is:

mp = function(k,v){
machine_id=v$machine_id
keyval(machine_id,1) }
rd = function(k,v) keyval(k,length(v))
mapreduce(input = mapreduce(input='\user\cloudera\sample.json', input.format="json" , map=function(k,v) keyval(k,v)) , map=mp, reduce = rd)

Unfortunately, it returns only the last two values of JSON file. It seems that it doesn't read entire of the value of the JSON file. I would appreciate any help.

Hossein
  • 1
  • 2

0 Answers0