3

How can I create a function like that https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html#create-function but defining the function in python?

I already did something like that:

from pyspark.sql.types import IntegerType
def relative_month(input_date):
  if input_date is not None:
    return ((input_date.month + 2) % 6)+1
  else:
    return None
_ = spark.udf.register("relative_month", relative_month, IntegerType())

But this UDF only works for the notebook that runs this piece of code.

I want to do the same thing using a SQL syntax to register the function because I will have some users using databricks trough SQL Clients and they will need the functions too.

In the Databricks docs says that i can define a resource:

: (JAR|FILE|ARCHIVE) file_uri

I need to create a .py file and put it somewhere in my databricks cluster?

Rafael Leinio
  • 121
  • 1
  • 6

1 Answers1

1

To share notebooks, set spark.databricks.session.share to true in the cluster’s configuration. Normally UDF's are application specific in spark and temporary so if one has to use it in other application , they have to register it again for using it. But as i said if you set the spark.databricks.session.share to true , you can share it across multiple notebook.

If it is for HIVE then you can register the UDF permanantly and can be accessible across multiple user's

Here is a similar thread for the same.See if it helps.

Databricks - Creating permanent User Defined Functions (UDFs)

Mohit Verma
  • 5,140
  • 2
  • 12
  • 27