11

I have two tables A and B that both have a column id. I wish to obtain ids from A that are not present in B. The obvious way is:

SELECT id FROM A WHERE id NOT IN (SELECT id FROM B)

Unfortunately, Hive doesn't support in, exists or subqueries. Is there a way to achieve the above using joins?

I thought of the following

SELECT A.id FROM A,B WHERE A.id<>B.id

But it seems like this will return the entirety of A, since there always exists an id in B that is not equal to any id in A.

elexhobby
  • 2,588
  • 5
  • 24
  • 33

4 Answers4

26

You can do the same with a LEFT OUTER JOIN in Hive:

SELECT A.id
FROM A
LEFT OUTER JOIN B
ON (B.id = A.id)
WHERE B.id IS null
Charles Menguy
  • 40,830
  • 17
  • 95
  • 117
  • sorry for being impolite, how if I want to left outer join C and expected of id based on A? I already try LEFT OUTER JOIN C on (C.id = in (A.id=B.id)) but it failed. – thecardcaptor Jan 13 '21 at 14:28
6

Hive seems to support IN, NOT IN, EXIST and NOT EXISTS from 0.13.

select count(*)
from flight a
where not exists(select b.tailnum from plane b where b.tailnum = a.tailnum);

The subqueries in EXIST and NOT EXISTS should have correlated predicates (like b.tailnum = a.tailnum in above sample) For more, refer Hive Wiki > Subqueries in the WHERE Clause

Sangmoon Oh
  • 81
  • 1
  • 2
3

Should you ever want to do an IN as so:

SELECT id FROM A WHERE id IN (SELECT id FROM B)

Hive has this covered with a LEFT SEMI JOIN:

SELECT a.key, a.val
FROM a LEFT SEMI JOIN b on (a.key = b.key)
fdrijver
  • 91
  • 1
  • 1
  • 6
1

if you can using spark sql you can use left anti join.

ex: SELECT A.id FROM A left anti join B on a.id=b.id

ravi teja
  • 11
  • 1