I've a HiveUDF which extends GenericUDF, when I call the udf via spark.sql I am getting the correct results but the initialized method is called multiple times.
Can't understand why that's happening?
I've a HiveUDF which extends GenericUDF, when I call the udf via spark.sql I am getting the correct results but the initialized method is called multiple times.
Can't understand why that's happening?
Seems to be a spark bug here https://issues.apache.org/jira/browse/SPARK-17728 .
You can try cache()
the data before applying the UDF, but some times this workaround costs performance.