I am able to create a UDF function and register to spark using spark.UDF method. However, this is per session only. How to register python UDF functions automatically when the Cluster starts?. These functions should be available to all users. Example use case is to convert time from UTC to local time zone.
Asked
Active
Viewed 5,049 times
2 Answers
4
This is not possible; this is not like UDFs in Hive.
Code the UDF as part of the package / program you submit or in the jar included in the Spark App, if using spark-submit.
However,
spark.udf.register.udf("...
is required to be done as well. This applies to Databrick notebooks, etc. The UDFs need to be re-registered per Spark Context/Session.

thebluephantom
- 16,458
- 8
- 40
- 83
-
Thanks. This helps. I will create a notebook with the common functions and call that in master notebook to register the functions. – Sam Feb 18 '19 at 16:53
3
acutally you can create a permanent function but not from a notebook you need to create it from a JAR file
https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
CREATE [TEMPORARY] FUNCTION [db_name.]function_name AS class_name [USING resource, ...]
resource: : (JAR|FILE|ARCHIVE) file_uri

Gerhard Brueckl
- 708
- 1
- 9
- 24
-
Is it possible to create permanent function using python file rather than jar file? – Shanil Apr 06 '23 at 08:33
-
as stated in the answer below - you can using `spark.udf.register.udf(...)` but you have to do it for each session again – Gerhard Brueckl Apr 09 '23 at 09:07