1

For example, the intersection

select intersect(array("A","B"), array("B","C"))

should return

["B"]

and the union

 select union(array("A","B"), array("B","C"))

should return

["A","B","C"]

What's the best way to make this in Hive? I have checked the hive documentation, but cannot find any relevant information to do this.

Osiris
  • 1,007
  • 4
  • 17
  • 30

2 Answers2

5

Your problem solution is here. Go to the githubLink, there is lot of udfs are created by klout. Download, crate the JAR and add the JAR in the hive. Example

 CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
 CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c","a"]
Kishore
  • 5,761
  • 5
  • 28
  • 53
  • 3
    The correct function found in the link above should be [brickhouse.udf.collect.ArrayIntersectUDF](https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/ArrayIntersectUDF.java) which can be used as `intersect_array(array1, array2, ...)` and [brickhouse.udf.collect.ArrayUnionUDF](https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/ArrayUnionUDF.java) as `array_union(array1, array2, ...)` – Christoph Körner Nov 03 '16 at 09:16
0
array_intersect(array1,array2,...) 

and

array_union(array1, array2, ...)
Ahmed Sbai
  • 10,695
  • 9
  • 19
  • 38