1

I have some JSON input that I need to parse and process (this is the first time I am using JSON). My input is as follows:

{"id":"id2","v":2, "d":{"Location":"JPN"})
{"id":"id1","v":1, "d":{"Location":"USA"}}
{"id":"id2","v":1, "d":{"Location":"JPN"}}
{"id":"id1","v":2, "d":{"Location":"USA"}}

My goal is to write a scalding script that groups the input by the Location field and output the count. SO in the above example, "JPN" and "USA" should have a count of 2 each. Scalding provides a class called JsonLine. My script is as follows:

class ParseJsonLine(args: Args) extends Job(args) { 

  JsonLine(args("input"), ('id, 'v, 'd)).read
    .groupBy('d){_.size}
    .write(args("output"))   
}

The above code compiles ok, but at runtime generates the following error:

Caused by: java.lang.ClassCastException: scala.collection.immutable.Map$Map1 cannot be cast to java.lang.Comparable

Basically, I am not sure how to reference the Location field. "d.Location" did not work and grouping by the complex structure "d" produces the arity error above. I did not find too many examples of nested input parsing using json in scalding. Also, I am not sure if there is something better than JsonLine for nested input.

I would appreciate your help.

thanks

Valentin Mercier
  • 5,256
  • 3
  • 26
  • 50
user2327621
  • 957
  • 3
  • 11
  • 15

1 Answers1

0

Perhaps using Symbol?

Take a look at the unit tests: https://github.com/twitter/scalding/blob/0.11.0/scalding-json/src/test/scala/com/twitter/scalding/JsonLineTest.scala

 JsonLine(args("input"), ('id, 'v, Symbol("d.Location"))).read
    .groupBy(Symbol("d.Location")){_.size}
    .write(args("output"))

Note: Learner here so feel free to improvise/correct/educate.

technotring
  • 187
  • 1
  • 5