6

Imagine the following code:

def myUdf(arg: Int) = udf((vector: MyData) => {
  // complex logic that returns a Double
})

How can I define the return type for myUdf so that people looking at the code will know immediately that it returns a Double?

Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154

4 Answers4

6

I see two ways to do it, either define a method first and then lift it to a function

def myMethod(vector:MyData) : Double = {
  // complex logic that returns a Double
}

val myUdf = udf(myMethod _)

or define a function first with explicit type:

val myFunction: Function1[MyData,Double] = (vector:MyData) => {
  // complex logic that returns a Double
}

val myUdf = udf(myFunction)

I normally use the firt approach for my UDFs

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
4

Spark functions define several udf methods that have the following modifier/type: static <RT,A1, ..., A10> UserDefinedFunction

You can specify the input/output data types in square brackets as follows:

def myUdf(arg: Int) = udf[Double, MyData]((vector: MyData) => {
  // complex logic that returns a Double
})
Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154
3

You can pass a type parameter to udf but you need to seemingly counter-intuitively pass the return type first, followed by the input types like [ReturnType, ArgTypes...], at least as of Spark 2.3.x. Using the original example (which seems to be a curried function based on arg):

def myUdf(arg: Int) = udf[Double, Seq[Int]]((vector: Seq[Int]) => {
  13.37 // whatever
})
conny
  • 9,973
  • 6
  • 38
  • 47
  • This is copy/paste of my answer that you are well aware of. – Marsellus Wallace May 18 '18 at 13:53
  • Oh I guess I just didn't properly see your self-answer since it was listed at the bottom! Honestly wasn't a copy paste, we just wrote the same thing / with different comment :) Anyway the _misleading_ compiler/IDE error "Undefined type parameters" which led to my comment are in fact not related to input type - but return type of the function - it will be the error as long as the body returns Unit. I'll update my answer to help other readers. – conny May 25 '18 at 07:03
  • And I just realized that I answered my own question and later accepted it... Not sure if I was looking for badges or just burnt out! – Marsellus Wallace May 25 '18 at 13:29
2

There is nothing special about UDF with lambda functions, they behave just like scala lambda function (see Specifying the lambda return type in Scala) so you could do:

def myUdf(arg: Int) = udf(((vector: MyData) => {
  // complex logic that returns a Double
}): (MyData => Double))

or instead explicitly define your function:

def myFuncWithArg(arg: Int) {
  def myFunc(vector: MyData): Double = {
     // complex logic that returns a Double. Use arg here
  }
  myFunc _
}

def myUdf(arg: Int) = udf(myFuncWithArg(arg))
Assaf Mendelson
  • 12,701
  • 5
  • 47
  • 56