4

I have two arrays of string in Hive like

{'value1','value2','value3'}
{'value1', 'value2'}

I want to merge arrays without duplicates, result:

{'value1','value2','value3'}

How I can do it in hive?

Adriano Foschi
  • 648
  • 1
  • 8
  • 23

2 Answers2

6

A native solution could be that:

SELECT id, collect_set(item)
FROM table
LATERAL VIEW explode(list) lTable AS item
GROUP BY id;

Firstly explode with lateralview, and next group by and remove duplicates with collect_set.

Adriano Foschi
  • 648
  • 1
  • 8
  • 23
5

You will need a UDF for this. Klout has a bunch of opensource HivUDFS under the package brickhouse. Here is the github link. They have a bunch of UDF's that exactly serves your purpose. Download,build and add the JAR. Here is an example

CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c","a"]
sayan dasgupta
  • 1,084
  • 6
  • 15
  • 1
    There is also 'intersect_array', which give the actual elements in common ( 'b','c') Sayan, Thanks so much for spreading the word about Brickhouse !!! I hope you find it useful. – Jerome Banks Feb 05 '14 at 22:03
  • Thanks for the fantastic package. It has some really great functions. I tried with combine_unique. It was not accepting 2 arrays. Hence this solution. The wiki needs to be modified – sayan dasgupta Feb 05 '14 at 22:32