0

I have a dataset which is the output of a pipe in scalding that looks like this:

'Var1, 'Var2, 'Var3, 'Var4 =
 a,x,1,2
 a,y,3,4
 b,x,1,2
 b,y,3,4

I'm trying to turn it into something like:

'Var1, 'Var3x, 'Var4x, 'Var3y, 'Var4y =
a,1,2,3,4
b,1,2,3,4

First I thought using flatMap somehow would work but that didn't seem right. Seems like some use of pivot function should work, but I can't quite work out how to pivot multiple columns.

Any help is appreciated.

J Calbreath
  • 2,665
  • 4
  • 22
  • 31

1 Answers1

1

You need to combine your two value columns into one, and then you can use .pivot. Something like this:

case class v34(v3: Int, v4: Int) 
pipe
    .map(('Var3, 'Var4) -> ('V34)) { vars: (Int, Int) => v34(vars._1, vars._2) }
    .groupBy('Var1) { _.pivot(('Var2, 'V34) => ('x, 'y)) }
    .mapTo(('Var1, 'x, 'y) -> ('Var1, 'Var3x, 'Var4x, 'Var3y, 'Var4y) { 
       vars: (String,V34,V34) =>
       val (key, xval, yval) = vars
       (key, xval.v3, xval.v4, yval.v3, yval.v4)
    }

    .
Dima
  • 39,570
  • 6
  • 44
  • 70
  • I got through the combination and pivot ok but can't unpack it properly. Couple of questions on the mapTo. 1) on the vars: line, should it read v34, v34 instead of V34,V34? This refers back to the name of the case class right? 2) I'm getting an error in the val line that says it can't find value Val. What exactly are cabal and uveal referring to? Thanks very much for your help. – J Calbreath Nov 24 '14 at 03:34
  • Sorry, in question 2 autocorrect seems to have gotten me. The error says 'cannot find value xval'. What are xval and yval referring to? – J Calbreath Nov 24 '14 at 04:42
  • 1) Yes, sorry, it was supposed to be v34, of course 2) I think, it is just a side effect of #1: `val (key, xval, yval) = vars` just declares three new variables, and unpacks `vars` into them. I think, it's just confused because it thinks `vars` isn't defined because of the typo in the type name. – Dima Nov 24 '14 at 10:04
  • Thanks! I figured out my error. I made this example generic, but when I converted back to my specific application, I was writing the equivalent of Xval instead of xval. Not sure why the capital letter at the beginning makes a difference (I'm pretty new to Scala and Scalding) but after changing it everything works great. – J Calbreath Nov 24 '14 at 16:09