0

Let's say I have data like:

id value
A  X
A  Y
A  Z
B  X
C  X
C  Y
C  W

And want to find all id that have both X and Y values or X and W. These are stored in a second table:

value1 value2
X      Y
X      W

And return:

id value1 value2
A  X      Y
C  X      Y
C  X      W

This needs to work at scale, where the first table has 100M lines and the second table thousands of rows. I will run in Impala or Hive.

sstan
  • 35,425
  • 6
  • 48
  • 66
ADJ
  • 4,892
  • 10
  • 50
  • 83

1 Answers1

1

You can do this using group by, having and some extra logic:

select id
from data d cross join
     table2 t2
group by id, t2.value1, t2.value2
having sum(case when d.value = t2.value1 then 1 else 0 end) > 0 and
       sum(case when d.value = t2.value2 then 1 else 0 end) > 0;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786