1

How create a UDF which take a String return multiple Strings ? The UDF so far I have seen could only give one output. How to get multiple feilds as output from a UDF ?

Simplest would be implementation of name -> FirstName, LastName. Not looking for alternate solution to split names, but looking for API / UDF which would help implement such needs .

Lets Say nameSplitteris my UDF

Select age,nameSplitter(name) as firstName,LastName from myTable;

InPut

    ****Input****
------------------------
    Age | Name
------------------------
    24  | John Smit
    13  | Sheldon Cooper
-------------------------

OutPut

    ****Out put ****
-----------------------------------
    Age | First Name  | Last Name
-----------------------------------
    24  | John        | Smit
    13  | Sheldon     | Cooper
-----------------------------------
Community
  • 1
  • 1
user2458922
  • 1,691
  • 1
  • 17
  • 37

1 Answers1

0

Use split() function, it splits strinng around regexp pattern and returns an array:

select age, 
       NameSplitted[0] as FirstName,
       NameSplitted[1] as LastName
  from  
    (
    select age, 
           split(Name,' +') as NameSplitted 
     from myTable
    )s;

Or just select age, split(Name,' +')[0] FirstName, split(Name,' +')[0] LastName from myTable;

Pattern ' +' means one or more spaces.

Also if you have three words names or even longer and you want to split only first word as a name and everything else as last name, or using more complex rule, you can use regexp_extract function like in this example:

hive> select regexp_extract('Johannes Chrysostomus Wolfgangus Theophilus Mozart', '^(.*?)(?: +)(.*)$', 1);
OK
Johannes
Time taken: 1.144 seconds, Fetched: 1 row(s)
hive> select regexp_extract('Johannes Chrysostomus Wolfgangus Theophilus Mozart', '^(.*?)(?: +)(.*)$', 2);
OK
Chrysostomus Wolfgangus Theophilus Mozart
Time taken: 0.692 seconds, Fetched: 1 row(s)

Pattern here means: the beginning of the string '^', first capturing group consisting of any number of characters (.*?), non-capturing group consisting of any number of spaces (?: +), last capturing group consisting of any number of characters greedy (.*), and $ means the end of the string

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Sir, How to have a UDF, which returns two feilds is the question, not alternate way to do this. – user2458922 Oct 31 '18 at 15:56
  • @user2458922 Sorry, I got you in a wrong way. Maybe this code would be useful and you can start implementing your own UDF: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java - this is split() UDF code, you can modify the code and build your own UDF. Try to implement your own UDF and ask questions about implementation. – leftjoin Oct 31 '18 at 16:18