We have written a Hive UDF in Java to fetch value from file added in distributed cache which works perfectly from a select query like :
Query 1.
select country_key, MyFunction(country_key,"/data/MyData.txt") as capital from tablename;
But not working when trying to create table from its output. Like :
Query 2.
create table new_table
as
select country_key, MyFunction(country_key,"/data/MyData.txt") as capital from tablename;
It is not even working from outer select. Like :
Query 3.
select t.capital from
(
select country_key, MyFunction(country_key,"/data/MyData.txt") as capital from tablename
) t;
Below is my UDF's evaluate function :
public class CountryMap extends UDF{
Map<Integer, String> countryMap = null;
public String evaluate(Integer keyCol, String mapFile) {
if (countryMap == null){
//read comma delimited data from mapFile and build a hashmap
countryMap.put(key, value);
}
if (countryMap.containsKey(keyCol)) {
return countryMap.get(keyCol);
}
return "NA";
}
}
Adding jar, file and creating Hive temporary function in Hive like:
ADD JAR /data/CountryMap-with-dependencies.jar;
ADD FILE /data/MyData.txt;
CREATE TEMPORARY FUNCTION MyFunction as 'CountryMap';
When I run query 1 I get expected value from Map but when I run query 2 and 3 I get 'NA'. When I returned Map.size() for query 2 and 3 in place of 'NA' it was zero.
I am puzzled why outer select or create table is not able to fetch coutryMap() value and why the size of Map becomes zero.