Questions tagged [udf]

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. Although the term is widely known in Hadoop components such Hive and Pig, it is also used in other contexts such programming languages and some DBMSs.

From the docs:

Introduction

Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, and JavaScript.

The most extensive support is provided for Java functions. You can customize all parts of the processing including data load/store, column transformation, and aggregation. Java functions are also more efficient because they are implemented in the same language as Pig and because additional interfaces are supported such as the Algebraic Interface and the Accumulator Interface.

Limited support is provided for Python and JavaScript functions. These functions are new, still evolving, additions to the system. Currently only the basic interface is supported; load/store functions are not supported. Furthermore, JavaScript is provided as an experimental feature because it did not go through the same amount of testing as Java or Python. At runtime note that Pig will automatically detect the usage of a scripting UDF in the Pig script and will automatically ship the corresponding scripting jar, either Jython or Rhino, to the backend.

537 questions
0
votes
1 answer

Hive UDF to fetch value from distributed cache not working with outer queries

We have written a Hive UDF in Java to fetch value from file added in distributed cache which works perfectly from a select query like : Query 1. select country_key, MyFunction(country_key,"/data/MyData.txt") as capital from tablename; But not…
Som
  • 91
  • 1
  • 10
0
votes
1 answer

BigQuery Javascript UDF fails with "Resources Exceeded"

This questions is probably another instance of BigQuery UDF memory exceeded error on multiple rows but works fine on single row But it was suggested that I post as a question instead of an answer. I'm using javascript to parse logfiles into a…
Lloyd Tabb
  • 86
  • 1
  • 1
  • 6
0
votes
0 answers

Cannot execute SQL from inside HIVE UDF via jdbc driver

This is my first post! I've been searching for solutions to this to no avail for weeks (on and off)... I have a java Hive UDF where I want to run SQL against Hive table to map data in memory for later use. I have the connection information…
Matt
  • 1
0
votes
1 answer

Aerospike: lua udf always returns an empty result even if udf return stream without any filtering, etc

Can not understand why aggregateQuery always returns an empty result. Tried to test in aql, the same problem: 0 rows in set. Indexes are all there. aql> show…
0
votes
1 answer

Pig: Multiple UDF in one class

I want to define multiple Pig UDF. Each of them will extract a different part of the data. In my case the data are JSON documents that have a complex structure including many nested JSON objects. The problem is that for now I have created a…
nikosdi
  • 2,138
  • 5
  • 26
  • 35
0
votes
1 answer

Declare table variable in a UDF to enter table name as a parameter

I am working on a query and created a function to fetch result from outcomes table susing the following code. CREATE FUNCTION dbo.Shippad (@tbl NVARCHAR(30)) RETURNS TABLE AS RETURN SELECT LEFT(ship, Charindex(' ', ship) - 1) + ' ' …
PURWU
  • 397
  • 1
  • 8
  • 22
0
votes
1 answer

hive: to_map function not working

I have below data into the hive table; select pid, year, catches from fielding_s where pid = 'zobribe01' group by id; zobribe01 2006 [{"p1":52,"p2":50,"p3":1322,"p4":86}] zobribe01 2007 [{"p1":30,"p2":26,"p3":674,"p4":37}] …
0
votes
0 answers

How to write a Java program to create a HIVE UDF which performs MINUS operation?

Can anybody give me more insights on how to write a JAVA program to create a UDF which performs MINUS operation as MINUS is not supported in HIVE . This can be achieved with left outer join . How do I start writing this UDF as a JAVA program ?
Vishwas V
  • 1
  • 1
0
votes
1 answer

xlwings(0.7.0) importing UDF error

Hello, My problem is that when i try to import an UDF in Ecxel 2013 I receive the error message can be seen on the picture. I have installed xlwings and it works except the UDF importing. Note that i used "xlwings quickstart myproject" so there is…
elwindly
  • 45
  • 1
  • 7
0
votes
1 answer

How can I take a list of values, perform multiple "cleaning" operations on them and place them elsewhere in the workbook?

I just spent most of the day trying to figure out how to do this, and the most I've gotten is one or two of the operations I wanted done, and then I can't get the rest to work. I will preface by saying that I currently have an "Intermediate"…
5il3nc3r
  • 11
  • 1
0
votes
0 answers

Hive UDF which gets parameter as input

Can we write an Hive UDF which takes 'String-value' as input . Any example or link will be helpful. I have visited the below link : http://blog.matthewrathbone.com/2013/08/10/guide-to-writing-hive-udfs.html Need some more help. package…
Mohit Rane
  • 279
  • 7
  • 23
0
votes
1 answer

Hive Macros/UDFs - Parallel/combined/single interpreter

I would like to create a Hive extension (macro / UDF / gateway / proxy / facade or something else) which can a) create/modify DB tables and b) process data. The issue here is that for b) parallel processing as common practice for UDFs is…
0
votes
1 answer

Reading pig scheme/header to understand the order of fields in a tuple

Is there a way to get access to .pig_schema or .pig_header value into a pig java udf, so that I know which field name is being parsed. I work on an PigStorage output generated by a different process and it keep changing rapidly. I want to make as…
rahulbmv
  • 704
  • 3
  • 16
0
votes
1 answer

UDF's in redshift : Possible to reference a udf within another

Is possible to nest UDF's within each other ? Following is a code for computing confidence intervals in A/B tests - Ofcourse, I could write a huge function that does all-in-one, but wondering a better way to achieve this goal ? set search_path to…
ekta
  • 1,560
  • 3
  • 28
  • 57
0
votes
2 answers

Hive for bag of words (word count for each word in the dictionary)

I have a table with this structure: user_id | message_id | content 1 | 1 | "I like cats" 1 | 1 | "I like dogs" And a list of valid words in dictionary.txt (or an external hive table), for…
Uri Goren
  • 13,386
  • 6
  • 58
  • 110