Questions tagged [user-defined-functions]

A function provided by the user of a program or an environment most often for spreadsheet type applications or database applications. Use [custom-functions-excel] for Excel and [custom-function] for Google sheets. Specify a programming language tag as well: [google-apps-script], [javascript], [sql], [tsql], etc. as well as a tag for the application: [excel], [google-spreadsheet], [sql-server] etc.

In the context of a programming language or an environment, a User Defined Function (UDF) is a function that is created by a user to perform a specific task (as opposed to a function that is intrinsic to the environment, built into the programming language or environment).

Spreadsheet applications like Excel and Google Sheets calls these "custom functions".

Microsoft also uses the term User Defined Functions with . The tag may also be applicable. See What is the need for user-defined functions in SQL Server?

Use:

4875 questions
1
vote
1 answer

How can I define user-defined aggregate functions in PySpark?

I want make an user defined aggregate function in pyspark. I found some documentation for Scala and would like to achieve something similar in Python. To be more specific, assume I already have a function like this implemented: def process_data(df:…
1
vote
1 answer

Using udf to split a cell and return first and last index

I'm using PySpark to apply a function to get the cell value, split by ' ' and get first and last index of the split, but this column contains null values and I'm not managing to handle this null before split. Here is my code: def…
1
vote
0 answers

Facing "object required" issue while using UDF in xlwings in some excel files while at other files there is no issue at all

I have created an UDF in python and trying to use it in excel. I am facing "object required" issue while using this UDF in xlwings in some excel files while at other excel files there is no issue at all. I am taking all the measures as suggested in…
1
vote
1 answer

Is it possible to get a Drop down of options in User Defined Function (UDF) using Python xlwings library?

I am trying to create a User defined function (UDF) using python xlwings library to recall price of the product. I have a list of 299 products, remembering name of all the products is not possible. I am trying to create an UDF like below to get the…
1
vote
1 answer

Return Future[List[DiagnosisCode]] from fetchDiagnosisForUniqueCodes method

I am not able to return Future[List[DiagnosisCode]] from fetchDiagnosisForUniqueCodes import scala.concurrent._ import ExecutionContext.Implicits.global case class DiagnosisCode(rootCode: String, uniqueCode: String, description: Option[String] =…
1
vote
0 answers

how to make vba UDF accept both scalar and array as arguments as Excel builtin functions and operators?

Excel builtin functions and operators can accept both scalars (single values) and arrays (such as a range of cells) as arguments, and will return scalar or array accordingly as results, spilling when approriate. To mimic this feature in user-defined…
Yuan Liu
  • 11
  • 2
1
vote
2 answers

Get last business day of the month in PySpark without UDF

I would like to get the last business day (LBD) of the month, and use LBD to filter records in a dataframe, I did come up with python code. But to achieve this functionality I need to use UDF. Is there any way to get the last business day of the…
subro
  • 1,167
  • 4
  • 20
  • 32
1
vote
1 answer

How to mock this PySpark udf?

My dataframe : My udf is below as: @udf(returnType=StringType()) def clean_email(email): try: regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' replace={"%20":"" , "//":"" ,"/":""} for i in replace: …
Xi12
  • 939
  • 2
  • 14
  • 27
1
vote
1 answer

Aggregate function ARRAY_AGG not allowed in SQL function

I have been trying to make a UDF in BigQuery to compress multiple rows into a single row CREATE OR REPLACE FUNCTION function_name(to_compress_column INT64, order_by_column INT64) AS ( TO_JSON_STRING( ARRAY_AGG( …
1
vote
2 answers

Add column with the first IP address of the subnet

I have PySpark dataframe with column named "subnet". I want to add a column which is the first IP of that subnet. I've tried many solutions including def get_first_ip(prefix): n = ipaddress.IPv4Network(prefix) first, last = n[0], n[-1] …
1
vote
1 answer

Using boxcox inside a user-defined function / object is not a matrix error

I´m trying to create a function to (visually) compare the distribution of a variable, with that of the same variable after a Box-Cox transformation. The variable is a single column pulled out of my entire data frame. library(EnvStats) bc_compare_1…
guy
  • 23
  • 3
1
vote
1 answer

No module named 'spacy' in PySpark

I am attempting to perform some entity extraction, using a custom NER spaCy model. The extraction will be done over a Spark Dataframe, and everything is being orchestrated in a Dataproc cluster (using a Jupyter Notebook, available in the…
1
vote
1 answer

How to create a Java UDF that performs an aggregate sum in Snowflake?

I'm dealing with a dataset that contains number that are in the 10*6 to 10**80 scale. The value column that holds this data is of string type. One of the common queries performed is a sum across the value column, for 100,000+ rows. However, there…
1
vote
0 answers

How to convert any Java or Scala object to a JSON object, and is it possible to bypass any serialization/deserialization with this?

I am working with some big data processing. For every row of a large dataframe table we have data stored as objects, and we have a function that expects a JSON object and runs some evaluations on that object. Currently we are serializing our object…
1
vote
0 answers

Pyspark Euclidean and Cosine distance between 2 arrays

I have a pyspark data frame with data shaped like the following (data made up): Dataframe I would like to calculate various distance metrics (such as cosine, euclidean) between the 2 vectors, vec1 and vec2, for each id in the dataframe, where…