0

We have written a Hive UDF in Java to fetch value from file added in distributed cache which works perfectly from a select query like :

Query 1.

select country_key, MyFunction(country_key,"/data/MyData.txt") as capital from tablename;

But not working when trying to create table from its output. Like :

Query 2.

 create table new_table 
    as 
    select country_key, MyFunction(country_key,"/data/MyData.txt") as capital from tablename;

It is not even working from outer select. Like :

Query 3.

select t.capital from 
(
select country_key, MyFunction(country_key,"/data/MyData.txt") as capital from tablename
) t;

Below is my UDF's evaluate function :

public class CountryMap extends UDF{

    Map<Integer, String> countryMap =  null;

    public String evaluate(Integer keyCol, String mapFile) {


        if (countryMap == null){
            //read comma delimited data from mapFile and build a hashmap
                countryMap.put(key, value);
            }

        if (countryMap.containsKey(keyCol)) {
                return countryMap.get(keyCol);
            }
        return "NA";
    }
}

Adding jar, file and creating Hive temporary function in Hive like:

ADD JAR /data/CountryMap-with-dependencies.jar;
ADD FILE /data/MyData.txt;
CREATE TEMPORARY FUNCTION MyFunction as 'CountryMap';

When I run query 1 I get expected value from Map but when I run query 2 and 3 I get 'NA'. When I returned Map.size() for query 2 and 3 in place of 'NA' it was zero.

I am puzzled why outer select or create table is not able to fetch coutryMap() value and why the size of Map becomes zero.

Som
  • 91
  • 1
  • 10

1 Answers1

0

What version of Hive do you use? Before 0.14.0 you had to set hive.cache.expr.evaluation = false; to get around a bug.

tsnee
  • 78
  • 7