Questions tagged [udf]

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. Although the term is widely known in Hadoop components such Hive and Pig, it is also used in other contexts such programming languages and some DBMSs.

From the docs:

Introduction

Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, and JavaScript.

The most extensive support is provided for Java functions. You can customize all parts of the processing including data load/store, column transformation, and aggregation. Java functions are also more efficient because they are implemented in the same language as Pig and because additional interfaces are supported such as the Algebraic Interface and the Accumulator Interface.

Limited support is provided for Python and JavaScript functions. These functions are new, still evolving, additions to the system. Currently only the basic interface is supported; load/store functions are not supported. Furthermore, JavaScript is provided as an experimental feature because it did not go through the same amount of testing as Java or Python. At runtime note that Pig will automatically detect the usage of a scripting UDF in the Pig script and will automatically ship the corresponding scripting jar, either Jython or Rhino, to the backend.

537 questions
0
votes
1 answer

Create Excel User Defined Function (UDF) that can sum mixed numbers and text

Data example in excel: COL A B C D F..... 1 SL..... 2 SL8 AL4 CD3 CN5 CD4 AL8 I am summing conditionally, based on the letter identifier within the cell. The UDF is entered…
mechengr02
  • 29
  • 10
0
votes
1 answer

Pig latin join by field

I have a Pig latin related problem: I have this data below (in one row): A = LOAD 'records' AS (f1:chararray, f2:chararray,f3:chararray, f4:chararray,f5:chararray, f6:chararray); DUMP A; (FITKA,FINVA,FINVU,FEEVA,FETKA,FINVA) Now I have another…
0
votes
1 answer

How to create an outputschema which has nested bags in pig

I am trying out Pig UDFs and have been reading about it. While the online content was helpful, I am still not sure if I understand how to create a complex output schema which has nested bags. Please help.The requirement is as follows. Say for…
user1652054
  • 445
  • 2
  • 11
  • 23
0
votes
1 answer

Un-nesting nested tuples to single terms

I have written an udf (extends EvalFunc) which has as output tuples with inner tuples (nested). For example the dump looks…
Stefanos13
  • 129
  • 1
  • 1
  • 11
0
votes
2 answers

Issues with a UDF

I have a UDF that accepts a bag as input and converts it to a map. Each key of the map consists of the distinct elements in the bag and the values corresponding to their count But it's failing the junit tests
user12331
  • 486
  • 7
  • 22
0
votes
3 answers

Presence of "in" in Pig's UDF causes problems

I was trying my first UDF in pig and wrote the following function - package com.pig.in.action.assignments.udf; import org.apache.pig.EvalFunc; import org.apache.pig.PigWarning; import org.apache.pig.data.Tuple; import java.io.IOException; public…
sgsi
  • 382
  • 1
  • 8
  • 18
0
votes
1 answer

how to pass an external property into a hive udf

I am writing a hive UDF in which I have to call an REST API and return an array of String. I have written the function with hardcoded REST API url. But now to make the endpoint configurable I want to take the host property out and put it in a…
mohit_d
  • 235
  • 2
  • 13
0
votes
1 answer

how do I mark one row and store its particular value in Hive using standard Query or using UDF?

I need to write a query in Hive or define a function that needs to do the followings: The dataset: Student || Time || ComuputerPool ------------------------------------- A || 9:15AM || Pool1.Machine2 ------------------------------------- …
Dilshad Abduwali
  • 1,388
  • 7
  • 26
  • 47
0
votes
1 answer

Passing query as a parameter to udf function

I would like pass a scalar valued select query as parameter to a function like so: select * from dbo.ftLatestOrderLines(select max(id) from [orders]) The db server throws this error: Msg 156, Level 15, State 1, Line 3 Incorrect syntax near the…
TonyP
  • 5,655
  • 13
  • 60
  • 94
0
votes
2 answers

Split characters inside Pig field

I have a text input with '|' separator as 0.0000|25000| |BM|BM901002500109999998|SZ which I split using PigStorage A = LOAD '/user/hue/data.txt' using PigStorage('|'); Now I need to split the field BM901002500109999998 into…
Abhi
  • 6,471
  • 6
  • 40
  • 57
0
votes
2 answers

not able to split the number value from the string

I'm creating a Udf function for an area conversion program in java. I have the following data: 230Sq.feet 110Sq.yards 8Acres 123Sq.Ft I want to split the above data like this: 230 Sq.feet 990 Sq.feet 344 Sq.feet 123 Sq.feet I tried the following…
0
votes
1 answer

Native Impala UDF (Cpp) randomly gives result as NULL for same inputs in the same table for multiple invocations in same query

I have a Native Impala UDF (Cpp) with two functions Both functions are complimentary to each other. String myUDF(BigInt) BigInt myUDFReverso(String) myUDF("myInput") gives some output which when myUDFReverso(myUDF("myInput")) should give back…
Suvarna Pattayil
  • 5,136
  • 5
  • 32
  • 59
0
votes
1 answer

Using Hive UDF in Impala gives erroneous results in Impala 1.2.4

I have two Hive UDFs in Java which work perfectly well in Hive. Both functions are complimentary to each other. String myUDF(BigInt) BigInt myUDFReverso(String) myUDF("myInput") gives some output which when myUDFReverso(myUDF("myInput")) should…
Suvarna Pattayil
  • 5,136
  • 5
  • 32
  • 59
0
votes
2 answers

Creating an Excel "master formula"

We have certain key metrics we would like to calculate on over 100 data entry forms. The metric calculations change from time to time. We would like to define one way to calculate the metric and have all data entry forms use that single definition.…
TheRizza
  • 1,577
  • 1
  • 10
  • 23
0
votes
1 answer

R Code: using UDF with multiple arguments for apply function

My UDF: testfn = function(x1, x2, x3){ if(x1 > 0){y = x1 + x2 + x3} if(x1 < 0){y = x1 - x2 - x3} return(y) } My Sample Test set: test = cbind(rep(1,3),c(2,4,6),c(1,2,3)) Running of apply: apply(test, 1, testfn, x1 = test[1], x2 = test[2], x3 =…