0

I want to write a Hive UDF which takes variable number of parameters (of different types) and output it as a JSON blob (with column name to column values mapping).

Select userId, myudf(col2, col3) from TABLE 2; // the output of udf should be {"col2":50, "col3":"Y" }

Select userId, myudf(col2, col3, col4) from TABLE 1; // the output of udf should be {"col2":"s", "col3":5, "col4":"Y"}

Select userId, myudf(col2, col3, col4, col6, col7) from TABLE 3; //the output of udf should be {"col2":"M", "col3":"A", "col4":2.5, "col6":"D", "col7":99 }

Each table has different columns with different types (userId is common in all of them). I am ok to pass column names separately, if that helps: myudf("col2", col2, "col3", col3). Any idea would be greatly appreciated.

Anil Padia
  • 513
  • 1
  • 6
  • 12
  • Isn't the argument for a Hive UDF just a Tuple? If so, you could just do `tuple.get(n)` for each positional argument – OneCricketeer Jan 27 '16 at 21:20
  • You can try passing the concatenated row as param and split it inside UDF to create JSON – Abhi Jan 27 '16 at 22:11
  • cricket_007@, yes it is just a tuple. The problem is type information (if I assume that I pass the column names, as UDF will not give that information) for the parameters. To convert to JSON, I will need to know the types of params as everything will get passed to the UDF as Generic Object. I can pass some type info as well (as one more argument per column), but I am looking for some better solution. – Anil Padia Jan 28 '16 at 17:07

1 Answers1

0

You should use the GenericUDF object (in order of the UDF object).

Mark Grover have wrote a good blog article about that http://mark.thegrovers.ca/tech-blog/how-to-write-a-hive-udf

Here it is the associated source code : https://github.com/markgrover/hive-translate/blob/master/src/main/java/org/mgrover/hive/translate/GenericUDFTranslate.java

Jérôme B
  • 420
  • 1
  • 6
  • 25