1

I'd like to understand how to use the CaseWhen expressions with the new DataFrame api.

I can't see any reference to it in the documentation, and the only place I saw it was in the code: https://github.com/apache/spark/blob/v1.4.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L397

I'd like to be able to write something like this:

val col = CaseWhen(Seq(
    $"a" === lit(1), lit(10),
    $"a" === lit(2), lit(15),
    ...
    lit(20)
))

but this code won't compile because the Seq is of type Column and not Expression

What is the proper way to use CaseWhen?

lev
  • 3,986
  • 4
  • 33
  • 46

1 Answers1

4

To be honest, I don't know whether CaseWhen is intended to be used as a user facing API. Instead you should use the when and otherwise method of the Column type. With those methods you can construct a CaseWhen column.

val column: Column = //some column

val result: Column = column.
  when($"a" === functions.lit(1), 10).
  when($"a" === functions.lit(2), 15).
  otherwise(20)
Till Rohrmann
  • 13,148
  • 1
  • 25
  • 51
  • You might be right, although it's not `private[slq]` nor marked as `@DeveloperApi`. And I can't do `column.expr` since `expr` is `protected` – lev Jul 27 '15 at 16:07
  • You're right. I missed that `expr` is protected. But I found another solution. I updated the answer. I hope this works now. – Till Rohrmann Jul 28 '15 at 07:17