Questions tagged [udf]

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. Although the term is widely known in Hadoop components such Hive and Pig, it is also used in other contexts such programming languages and some DBMSs.

From the docs:

Introduction

Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, and JavaScript.

The most extensive support is provided for Java functions. You can customize all parts of the processing including data load/store, column transformation, and aggregation. Java functions are also more efficient because they are implemented in the same language as Pig and because additional interfaces are supported such as the Algebraic Interface and the Accumulator Interface.

Limited support is provided for Python and JavaScript functions. These functions are new, still evolving, additions to the system. Currently only the basic interface is supported; load/store functions are not supported. Furthermore, JavaScript is provided as an experimental feature because it did not go through the same amount of testing as Java or Python. At runtime note that Pig will automatically detect the usage of a scripting UDF in the Pig script and will automatically ship the corresponding scripting jar, either Jython or Rhino, to the backend.

537 questions
0
votes
3 answers

Trying to get Excel UDF custom cell formatting to return either nothing '(empty cell)' or 'formatted user input'

im trying to use a custom defined UDF function to format a huge long list of excel phone numbers I have, as well as any future phone number entries I add to this excel file. Aside: the UDF itself just formats phone numbers, cleanly, so that no…
0
votes
1 answer

How can I pass a variable of type datetime from Pig to python UDF

My python UDF code, born is a datetime variable from Pig , I tried it as string object but it also gave an error, and treating it as a datetime object also gave an error from datetime import date @outputSchema("age_key:chararray") def…
pratiklodha
  • 1,095
  • 12
  • 20
0
votes
1 answer

What format does data input have to be for Python's json.loads?

I'm trying to use json.loads to parse data in a Redshift database table. I've stripped out the function to test in a Python script and am having trouble understanding what's happening. The code I'm using is: import json j="'['Bars', 'American…
simplycoding
  • 2,770
  • 9
  • 46
  • 91
0
votes
1 answer

Merge multiple columns in a Spark DataFrame [Java]

How to combine multiple columns (say 3) from a DataFrame in a single column (in a new DataFrame) where each row becomes a Spark DenseVector? Similar to this thread but in Java and with a few tweaks mentioned below. I tried using a UDF like…
Rajko
  • 5
  • 1
  • 5
0
votes
1 answer

[Q]:Syntax error in creating UDF in mysql

hey guys sorry for this noob question, i am trying to learn mysql but i am stuck at creating udf. can someone please explain why this is syntax error and what is the possible fix. thanks create FUNCTION temtotalgrades (@p_Studid int, @p_year…
noobme
  • 3
  • 3
0
votes
2 answers

PIG UDF to convert tuple to multiple tuple output

I am new to PIG and I am trying to create a UDF which get a tuple and return multiple tuple based on a delimited. So I have written one UDF to read the below data file 2012/01/01 Name1 Category1|Category2|Category3 2012/01/01 Name2…
Arpan
  • 913
  • 2
  • 12
  • 19
0
votes
1 answer

UDF in pyspark SQL Context sending data as columns

I have written a udf in pyspark like below: df1 = df.where(point_inside_polygon(latitide,longitude,polygonArr)) df1 and df are spark dataframes The function is given below: def point_inside_polygon(x,y,poly): latt = float(x) long = float(y) if…
thenakulchawla
  • 5,024
  • 7
  • 30
  • 42
0
votes
2 answers

nagging #value! error in excel udf

I am new to vb and excel but I have to develop a custom udf for excel. I have read I have tried to alter my code below many times with the suggestions on this forum to no avail. What am I missing. This code is for working out a Julian date in…
ssn
  • 439
  • 5
  • 14
0
votes
1 answer

How to pass two-dimensional array to User defined functions?

Using UDF implies that each factor c1, c2, c3 must be passed by parameter independently. Is there any flexible solution, e.g. how to pass a sequence of these factors to UDF? val myFunction = udf { (userBias: Float, productBias: Float,…
Klue
  • 1,317
  • 5
  • 22
  • 43
0
votes
1 answer

Can you create a BigQuery UDF that generates lists of tables, instead of operating on rows?

I'm looking at the user-defined function docs for BigQuery, but I need to define a function to simplify the process of querying multiple tables. I have people who have to do stuff like this: SELECT * FROM…
Sniggerfardimungus
  • 11,583
  • 10
  • 52
  • 97
0
votes
1 answer

Shortest Pig script that will use Accumulator

I'm adding an Accumulator implementation to a Pig UDF, and I want to test it. What is the shortest and simplest Pig script that will use the accumulator? For simplicity's sake, assume that it will load a file with N integers, where N >…
Eyal
  • 3,412
  • 1
  • 44
  • 60
0
votes
2 answers

hive udf execution via shell script

I have a Hive Udf that works well in hive terminal, What I want i want to execute it via shell script. On hive terminal i am able to execute following commands : use mashery_db; add jar…
Mohit Rane
  • 279
  • 7
  • 23
0
votes
1 answer

UDF not updating when rows inserted

I'm pretty new to UDF's and I'm not sure entirely how they function. My function returns correct information so long no new rows are inserted. It's as if headRng gets saved to memory when first used and doesn't get updated even if a new row is…
click here
  • 814
  • 2
  • 10
  • 24
0
votes
1 answer

Java UDF on Hadoop input parameter -- call from Pig on Hadoop

If I have the following data structure (a relation) in Pig and I want to pass it to a Java UDF, wondering what should be the related Java data type of the input parameter? (student relation is a bag, schema is ID as int, a tuple contains an interest…
Lin Ma
  • 9,739
  • 32
  • 105
  • 175
0
votes
2 answers

Unable to pass pig tuple to python UDF

I have master.txt which has 10K records, so each line of it will be a tuple & whole of the same needs to be passed to python UDF. Since it has multiple records, so on storing p2preportmap getting following error. Please help Error is as…
Amit
  • 89
  • 11