Questions tagged [udf]

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. Although the term is widely known in Hadoop components such Hive and Pig, it is also used in other contexts such programming languages and some DBMSs.

From the docs:

Introduction

Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, and JavaScript.

The most extensive support is provided for Java functions. You can customize all parts of the processing including data load/store, column transformation, and aggregation. Java functions are also more efficient because they are implemented in the same language as Pig and because additional interfaces are supported such as the Algebraic Interface and the Accumulator Interface.

Limited support is provided for Python and JavaScript functions. These functions are new, still evolving, additions to the system. Currently only the basic interface is supported; load/store functions are not supported. Furthermore, JavaScript is provided as an experimental feature because it did not go through the same amount of testing as Java or Python. At runtime note that Pig will automatically detect the usage of a scripting UDF in the Pig script and will automatically ship the corresponding scripting jar, either Jython or Rhino, to the backend.

537 questions
7
votes
1 answer

Aggregate UDFs with Python in Redshift

I managed to write a few scalar functions with Python in AmazonRedshift, i.e. taking one or a few columns as input and returning a single value based on some logic or transformation. But is there any way to pass all the values of a numeric…
and_apo
  • 1,217
  • 3
  • 17
  • 41
7
votes
1 answer

Use a UDF as the default value in a table column in SQL Server

I created a scaler UDF (called sCurrentAppUser()) in SQL Server 2012 Express and I would like to use this UDF as a default value when defining a table. But every time I try, I get an error of "'sCurrentAppUser' is not a recognized built-in function…
Jason
  • 349
  • 3
  • 9
7
votes
0 answers

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:

I am trying to register a simple UDF for extracting date functionality in spark using Scala Luna Eclipse IDE. This is my code: sqlContext.udf.register("extract", (dateUnit: String, date : String) => udf.extract(dateUnit,date ) ) def…
Prem Singh Bist
  • 1,273
  • 5
  • 22
  • 37
7
votes
2 answers

Need to stop UDFs recalculating when unrelated cells deleted

I've noticed that my UDFs recalculate whenever I delete cells. This causes massive delays when deleting entire columns, because the UDF gets called for each and every cell it is used in. So if you're using 1000 UDFS, then deleting a column or cell…
jeffreyweir
  • 4,668
  • 1
  • 16
  • 27
6
votes
4 answers

Define return value in Spark Scala UDF

Imagine the following code: def myUdf(arg: Int) = udf((vector: MyData) => { // complex logic that returns a Double }) How can I define the return type for myUdf so that people looking at the code will know immediately that it returns a Double?
Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154
6
votes
2 answers

Track or log calls to a user-defined function in SQL Server

Understanding that side-effecting operators (like "insert") are disallowed in user-defined functions, how does one log (or otherwise track) calls to a specific user-defined function? I'd also like to capture the parameters passed into the…
John Joseph
  • 1,003
  • 1
  • 10
  • 20
6
votes
1 answer

Passing a list of tuples as a parameter to a spark udf in scala

I am trying to pass a list of tuples to a udf in scala. I am not sure how to exactly define the datatype for this. I tried to pass it as a whole row but it can't really resolve it. I need to sort the list based on the first element of the tuple and…
Roshini
  • 703
  • 2
  • 8
  • 21
5
votes
2 answers

How to create UDF from Scala methods (to compute md5)?

I would like to build one UDF from two already working functions. I'm trying to calculate a md5 hash as a new column to an existing Spark Dataframe. def md5(s: String): String = {…
br0ken.pipe
  • 850
  • 3
  • 17
  • 32
5
votes
1 answer

Spark Struct structfield names getting changed in UDF

I am trying to pass a struct in spark to udf. It is changing the field names and renaming to the column position. How do I fix it? object TestCSV { def main(args: Array[String]) { val conf = new…
hp2326
  • 181
  • 1
  • 3
  • 12
5
votes
3 answers

Registering Hive Custom UDF with Spark (Spark SQL) 2.0.0

I am working on a spark 2.0.0 piece where my requirement is to use 'com.facebook.hive.udf.UDFNumberRows' function in my sql context to use in one of the queries. In my cluster with Hive query, I use this as a temporary function just by defining :…
Apratim Tiwari
  • 353
  • 1
  • 5
  • 9
5
votes
2 answers

How to return complex types using spark UDFs

Hello and thank you in advance. My program is written in java and i can not move to scala. I am currently working with a spark DataFrame extracted from a json file using the following line: DataFrame dff =…
Albert CR
  • 119
  • 1
  • 7
5
votes
2 answers

Calculate number of days excluding sunday in Hive

I have two timestamps as input. I want to calculate the time difference in hours between those timestamps excluding Sundays. I can get the number of days using datediff function in hive. I can get the day of a particular date using…
Vanaja Jayaraman
  • 753
  • 3
  • 18
5
votes
2 answers

In Spark SQL, how do you register and use a generic UDF?

In my Project, I want to achieve ADD(+) function, but my parameter maybe LongType, DoubleType, IntType. I use sqlContext.udf.register("add",XXX), but I don't know how to write XXX, which is to make generic functions.
yjxyjx
  • 121
  • 1
  • 3
  • 8
5
votes
4 answers

How to extract rows from a json array using the mysql udf json_extract 0.4.0?

I have some sql that I want to pass into a mysql stored procedure. I'm using the json functions in mysql-json-udfs-0.4.0-labs-json-udfs-linux-glibc2.5-x86_64. We are running a mysql 5.5.4 server. Updating to 5.7.x is an option. When I run set…
Keith John Hutchison
  • 4,955
  • 11
  • 46
  • 64
5
votes
1 answer

Aerospike NodeJS UDF Aggregation Error

I've created an aggregate function which works in aerospike which works in AQL: AGGREGATE filter2.check_teamId('123', 0, 1456499994597) ON analytics.tracking WHERE teamId = '123' This returns results. I'm then trying to use the same UDF in…
TStu
  • 244
  • 3
  • 15
1
2
3
35 36