Failed to execute user defined function

Question

I have the following UDF:

val jac_index:(Array[String],Array[String])=>Float=(Sq1:Array[String],Sq2:Array[String])=>
{
    val Sq3=Sq1.intersect(Sq2)
    val Sq4=Sq1.union(Sq2).distinct
    if (!Sq4.isEmpty) Sq3.length.toFloat/Sq4.length.toFloat else 0F
}
val jacUDF=udf(jac_index)

and when I execute the following sentence

val movie_jac_df=movie_pairs_df.withColumn("jac",jacUDF(movie_pairs_df("name"),movie_pairs_df("name2")))

I get the error "Failed to execute user defined function"

the schema of movie_pairs_df is the following

root
 |-- movie: string (nullable = true)
 |-- name: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- movie2: string (nullable = true)
 |-- name2: array (nullable = true)
 |    |-- element: string (containsNull = true)

So what's the cause?

score 1 · Accepted Answer · answered Jun 05 '17 at 19:04

1

Spark's DataFrames model Array columns as mutable.WrappedArray, which means your UDF should take two WrappedArrays as its input;

If you change jac_index to expect two such arrays:

import scala.collection.mutable

val jac_index: (mutable.WrappedArray[String], mutable.WrappedArray[String]) => Float = 
  (Sq1, Sq2) => { /* same implementation */ }

This will work as expected.

answered Jun 05 '17 at 19:04

Tzach Zohar

37,442
3
79
85

It works, thank you! – leonfrank Jun 05 '17 at 19:13

Ramesh Maharjan · Answer 2 · 2017-06-05T19:13:34.973

0

define the udf as below

val jacUDF = udf((Sq1:mutable.WrappedArray[String], Sq2:mutable.WrappedArray[String]) => {
  val Sq3=Sq1.intersect(Sq2)
  val Sq4=Sq1.union(Sq2).distinct
  if (!Sq4.isEmpty) Sq3.length.toFloat/Sq4.length.toFloat else 0F
})

edited Jun 05 '17 at 19:13

answered Jun 05 '17 at 19:05

Ramesh Maharjan

41,071
6
69
97

Failed to execute user defined function

2 Answers2