I have 2 tables, "transaksi" and "buku". "transaksi" has around ~250k rows, and buku has around ~170k rows. Both tables have column called "k999a", and both tables use no indexes. Now I check these 2 statements.
Statement 1:
explain select k999a from transaksi where k999a not in (select k999a from buku);
Statement 1 outputs:
Seq Scan on transaksi (cost=0.00..721109017.46 rows=125426 width=9)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..5321.60 rows=171040 width=8)
-> Seq Scan on buku (cost=0.00..3797.40 rows=171040 width=8)
Statement 2:
explain select k999a from transaksi where k999a in (select k999a from buku);
Statement 2 outputs:
Hash Semi Join (cost=6604.40..22664.82 rows=250853 width=9)
Hash Cond: (transaksi.k999a = buku.k999a)
-> Seq Scan on transaksi (cost=0.00..6356.53 rows=250853 width=9)
-> Hash (cost=3797.40..3797.40 rows=171040 width=8)
-> Seq Scan on buku (cost=0.00..3797.40 rows=171040 width=8)
Why in the NOT IN query, postgresql does loop join, making the query takes a long time?
PS: postgresql version 9.6.1 on windows 10