Questions tagged [udf]

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. Although the term is widely known in Hadoop components such Hive and Pig, it is also used in other contexts such programming languages and some DBMSs.

From the docs:

Introduction

Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, and JavaScript.

The most extensive support is provided for Java functions. You can customize all parts of the processing including data load/store, column transformation, and aggregation. Java functions are also more efficient because they are implemented in the same language as Pig and because additional interfaces are supported such as the Algebraic Interface and the Accumulator Interface.

Limited support is provided for Python and JavaScript functions. These functions are new, still evolving, additions to the system. Currently only the basic interface is supported; load/store functions are not supported. Furthermore, JavaScript is provided as an experimental feature because it did not go through the same amount of testing as Java or Python. At runtime note that Pig will automatically detect the usage of a scripting UDF in the Pig script and will automatically ship the corresponding scripting jar, either Jython or Rhino, to the backend.

537 questions
5
votes
2 answers

Redshift Python UDFs Varchar limits

I have successfully created a Python UDF that accepts a varchar value from a table and extracts a substring of that value based on a regex. The max size of that varchar column in the DDL is set to be 20000 bytes, and in some occasions the UDF…
and_apo
  • 1,217
  • 3
  • 17
  • 41
4
votes
2 answers

Aerospike multiple filter query?

Reading the documentation did not help me much. 1)As I understood, there is no ability to use multiple filters at one query. If so, how with Aerospike java client API, I write such as query: SELECT * FROM TABLE_NAME WHERE COLUMN1 = 1 AND COLUMN2 =…
Azat Nugusbayev
  • 1,391
  • 11
  • 19
4
votes
1 answer

Register UDF with descriptions of arguments using excel addin

I have an addin with an UDF getRegExResult. I want to add a function description and arguments descriptions to this function, so when user installs the addin, closes, opens excel few times and goes to "Insert Function" Dialog box he will be able to…
kolcinx
  • 2,183
  • 1
  • 15
  • 38
4
votes
1 answer

Spark UDF Null handling

I'm struggeling handling null values in a UDF which operates on dataframe (which originates from a hive table) consisting of a struct of floats: The dataframe (points) has the following schema: root |-- point: struct (nullable = true) | |-- x:…
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
4
votes
1 answer

spark UDF Java Error: Method col([class java.util.ArrayList]) does not exist

I have a python dict as: fileClass = {'a1' : ['a','b','c','d'], 'b1':['a','e','d'], 'c1': ['a','c','d','f','g']} and a list of tuples as: C = [('a','b'), ('c','d'),('e')] I want to finally create a spark dataframe as: Name (a,b) (c,d) (e) a1 2…
pipal
  • 113
  • 1
  • 1
  • 9
4
votes
2 answers

Define spark udf by reflection on a String

I am trying to define a udf in spark(2.0) from a string containing scala function definition.Here is the snippet: val universe: scala.reflect.runtime.universe.type = scala.reflect.runtime.universe import universe._ import…
sourabh
  • 466
  • 4
  • 13
4
votes
0 answers

ERROR optimizer.ConstantPropagateProcFactory when querying an UDF

I get the following error output in hive when querying my Generic UDF: ERROR optimizer.ConstantPropagateProcFactory: Unable to evaluate org.apache.hadoop.hive.ql.udf.generic.GenericUDFMap@554286a4. Return value unrecoginizable. I get the error in…
Smicker
  • 41
  • 1
4
votes
2 answers

Why is my original array being altered?

Based on Coldfusion documentation... "Arrays are passed to user-defined functions by value, so the function gets a new copy of the array data, and the array in the calling page is unchanged by the function." So I'm working on a little practice…
mts1701
  • 165
  • 6
4
votes
1 answer

VBA UDF changes values on ALL sheets. How to limit to one?

I have made a UDF that works on a single sheet. The problem occurs with multiple sheets. If I have the formula on multiple sheets, and if I (re-)load it on one sheet, it changes the output in ALL other sheets, too. Why does this happen? I am not…
sandboxj
  • 1,234
  • 3
  • 21
  • 47
4
votes
2 answers

Spark UDF exception when accessing broadcast variable

I'm having difficulty accessing a scala.collection.immutable.Map from inside a spark UDF. I'm broadcasting the map val browserLangMap = sc.broadcast (Source.fromFile(browserLangFilePath).getLines.map(_.split(,)).map(e =>…
Cheeko
  • 1,193
  • 1
  • 12
  • 23
4
votes
3 answers

hive output consists of these 2 warnings at the end. How do I suppress these 2 warnings

Hive query output that is using UDFs consists of these 2 warnings at the end. How do I suppress these 2 warnings. Please note that the 2 warnings come right after the output as part of output. WARN: The method class…
user3441798
  • 41
  • 1
  • 4
4
votes
1 answer

How to get schema of input in exec function in Pig UDF

I wonder how I can get get schema of input in exec() function when I build UDF in Piglatin. I can get schema from outputSchema() function but looks like the result can't be leveraged by backend functions. Any hints will be highly appreciate!
Mercury
  • 33
  • 3
3
votes
0 answers

Excel function as 'Link_Location' in HYPERLINK formula is called on writing

I'm using XLWINGS to add some functionality to Excel. I want to be able to use a specific audio player with specific codecs to play audio files when the hyperlink in the excel cell is clicked (not when the hyperlink formula is written to its…
user3535074
  • 1,268
  • 8
  • 26
  • 48
3
votes
0 answers

Why Cassandra UDF performance is worse than java code

I’ve a use case where I need to fetch all the records from Cassandra for a given time range and divide it into 30 chunks then further aggregate each chunk, for example let us suppose I’m fetching 60 records for a time range of 30 minutes. Now I need…
Vikas Singh
  • 399
  • 4
  • 8
3
votes
2 answers

Aerospike: How to perform IN query on PK

How to perform (sql like) IN queries in aerospike. Do we need an UDF for this? Something like this: Select * from ns.set where PK in (1,2,3) If this requires a UDF how to go about it as the UDF is executed on a key: EXECUTE…
Sandeep B
  • 765
  • 1
  • 6
  • 19
1 2
3
35 36