3

Could someone point me to a link that explains how to read and write simple case classes in scalding? Is there some default serialization scheme?

For example, I have jobs that create pipes of com.twitter.algebird.Moments. I wish to write the pipes to HDFS and read them using a different job.

Fer example: I tried to write using:

pipe.write(Tsv(outputPath))

And read using:

class MomentsReadingExample (args: Args) extends Job(args){
  val pipe = Tsv(args("input"), ('term, 'appearanceMoments, 'totalMoments)).read

  val withSum = pipe.map(('appearanceMoments, 'totalMoments) -> 'sum) {
    x: (Moments, Moments) => MomentsGroup.plus(x._1, x._2)
  }

  withSum.write(Tsv(args("output")))
}

I am getting the following error:

java.lang.ClassCastException: java.lang.String cannot be cast to com.twitter.algebird.Moments

1 Answers1

1

One way would be to use pack and unpack.

pipe
  .unpack[Moments]('appearanceMoments -> ('m0, 'm1, 'm2, 'm3, 'm4))
  .write(Tsv(outputPath))

Tsv(args("input"), ('term, 'm0, 'm1, 'm2, 'm3, 'm4).read
  .pack[Moments](('m0, 'm1, 'm2, 'm3, 'm4) -> 'appearanceMoments)
Marius Soutier
  • 11,184
  • 1
  • 38
  • 48