Let's say I have data like:
id value
A X
A Y
A Z
B X
C X
C Y
C W
And want to find all id that have both X and Y values or X and W. These are stored in a second table:
value1 value2
X Y
X W
And return:
id value1 value2
A X Y
C X Y
C X W
This needs to work at scale, where the first table has 100M lines and the second table thousands of rows. I will run in Impala or Hive.