Questions tagged [user-defined-functions]

A function provided by the user of a program or an environment most often for spreadsheet type applications or database applications. Use [custom-functions-excel] for Excel and [custom-function] for Google sheets. Specify a programming language tag as well: [google-apps-script], [javascript], [sql], [tsql], etc. as well as a tag for the application: [excel], [google-spreadsheet], [sql-server] etc.

In the context of a programming language or an environment, a User Defined Function (UDF) is a function that is created by a user to perform a specific task (as opposed to a function that is intrinsic to the environment, built into the programming language or environment).

Spreadsheet applications like Excel and Google Sheets calls these "custom functions".

Microsoft also uses the term User Defined Functions with . The tag may also be applicable. See What is the need for user-defined functions in SQL Server?

Use:

4875 questions
1
vote
2 answers

Spark merge two columns that are arrays of different structs with overlapping field

I have a question I was unable to solve when working with Scala Spark (or PySpark). How can we merge two fields that are arrays of structs of different fields. For example, if I have schema like so: df.printSchema() root |-- arrayOne: array…
1
vote
1 answer

Updating a column with a function in spark scala

I have this column in my database, called id which contains INTS. e.g: {id: 123456} {id: 234567} {id: 345678} {id: 456789} {id: 567890} and I need to update these values with it's encrypted values by calling a function encryptId(id).…
1
vote
1 answer

SQLite / System.Data.SQLite / User defined function registered with other name than "REGEXP" results in an exception

Following the examples in StackOverflow Create User Defined Functions and sqlite net sqlitefunction not working, I defined a UDF for testing purposes: [SQLiteFunction(Name = "REGEXP", Arguments = 2, FuncType = FunctionType.Scalar)] public class…
Areb
  • 31
  • 3
1
vote
1 answer

How to tell a custom R function which data to use?

I have recently started writing custom R function but got stuck with some problems. I have the following dat data frame. Test data G<- c(1, 1, 1, 1, 2, 2, 2, 2) # Gender 1= male, 2= female A<- c(24.5, 25.5, 26.5, 27.5, 24.5, 25.5, 26.5, 27.5) # Age…
1
vote
1 answer

Filtering values that are not in a list using expr and filter

I want to filter out rows in a dataframe where a column is not part of a list. I am aware that I can use udf to go about this and it works. def filterNegatives(val: Seq[String]): Seq[String] = { val.filter(v => !badList.contains(v)) } val…
Thal
  • 93
  • 2
  • 7
1
vote
1 answer

Debugging PySpark udf (lambda function using datetime)

I came across- the below lambda code line in PySpark while browsing a long python Jupyter notebook, I am trying to understand this piece of line. Can you explain what it does in a best possible way? parse = udf (lambda x:…
1
vote
2 answers

Unique element count in array column

I have this dataset with a column of array type. From this column, we need to create another column which will have list of unique elements and its counts. Example [a,b,e,b] results should be [[b,a,e],[2,1,1]]. Data should be sorted by count. Even…
1
vote
2 answers

Cassandra UDF: getting error on checking null values

I wrote an UDF like the below: CREATE FUNCTION myspace.getValue(lng bigint, dbl double, etc double) RETURNS NULL ON NULL INPUT RETURNS double LANGUAGE java AS 'if (lng != null) {return (double)lng;} else if (dbl != null) { return dbl;} else return…
Hamzeh
  • 11
  • 3
1
vote
1 answer

UDF: 'TypeError: 'int' object is not iterable'

I'm converting Scala code to Python. Scala code is using UDF. def getVectors(searchTermsToProcessWithTokens: Dataset[Person]): Dataset[Person] = { import searchTermsToProcessWithTokens.sparkSession.implicits._ def addVectors( …
1
vote
1 answer

As a data provider, how can I measure the size of the output tables of a shared, secure UDF on Snowflake that is called by a consumer?

Many thanks in advance. I'm looking to use Snowflake to share sensitive data as a provider. I can securely share the data via a secure UDF on a share, but I'd feel more comfortable if I could measure how many rows a consumer queries. I.e. I want to…
1
vote
1 answer

Pyspark dataframe: Create a new numeric column from a string column and calculate average

I have a pyspark dataframe like the input data below. subject score column type is string. I want to first covert the string column type to integer column type The desired result is shown in Output 1. I wish to calculate the average in a new numeric…
AndyD
  • 43
  • 1
  • 6
1
vote
1 answer

how can we handle exceptions with bigquery functions?

given any function like e.g: CREATE FUNCTION ds.fn(param ANY TYPE) RETURNS STRING AS ( (SELECT 1/0) ); Is there a way for handling errors when the statement fails and return a default value? Note: My question is about any error that a select…
1
vote
2 answers

How to prefix all columns in a join table withe the names of their origin table without explicitly renaming them one by one?

I am joining a few tables which have many columns and also have duplicate column names. To remember which column came from which table, I would like to prefix/suffix all columns with the table acronym/name in the result of the join. For a simple…
alex_ro
  • 11
  • 4
1
vote
1 answer

Pyspark alternative to UDF function which loops an array

I've searched and can't find a suitable answer for my Pyspark issue. I'm looking for an alternative approach which is more efficient and doesn't use a UDF. I have a simple equation in a UDF which has inputs from (a)literal constant, (b)column…
1
vote
1 answer

Multiple aggregation over multiple columns

I want to write a UDF over a data frame that operates as comparing values of particular row against the values from same group, where the grouping is by multiple keys. As UDFs operate on a single row, I want to write a query that returns values from…