4

Let's say I want to make an spark UDF to reverse the ordering of an array of structs. The concrete type of the struct should not matter, so I tried:

val reverseUDF = udf((s:Seq[_]) => s.reverse)

But this gives

java.lang.UnsupportedOperationException: Schema for type Any is not supported

I also tried to use a generic method and force type generic type parameter to be a subtype of Product:

def reverse[T <: Product](s:Seq[T]) = {
  s.reverse
}

val reverseUDF = udf(reverse _)

This gives:

scala.MatchError: Nothing (of class scala.reflect.internal.Types$TypeRef$$anon$6)

So is this even possible?

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145

1 Answers1

2

It is not. Spark has to know return output type and it is not possible to determine it using SQL expressions. You'll have to define specific udf for each type you want to use for example:

udf(reverse[(String, Int)] _)
udf(reverse[(String, Long, String)] _)

and so on. However none of these is useful in practice, because you'll never see Product type in your udf. A struct type is always encoded as Row - Spark Sql UDF with complex input parameter.

If you use Spark 2.3 you could express arbitrary reverse as:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.DataType

def reverse(schema: DataType) = udf(
  (xs: Seq[Row]) => xs.map(x => Row.fromSeq(x.toSeq.reverse)),
  schema
)

but you'll have to provide schema for each instance:

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115