0

I've a HiveUDF which extends GenericUDF, when I call the udf via spark.sql I am getting the correct results but the initialized method is called multiple times.

Can't understand why that's happening?

1 Answers1

1

Seems to be a spark bug here https://issues.apache.org/jira/browse/SPARK-17728 .

You can try cache() the data before applying the UDF, but some times this workaround costs performance.

cozyss
  • 1,290
  • 1
  • 15
  • 22