We have a production POSTGRES 9.6 db with about 100M records (LOYALTY) and new table (< few thousands as of now) "INFO"
Base Table (written in django)
class Loyalty(models.Model):
customer = models.ForeignKey(Customer, db_index=True)
order = models.ForeignKey(Order, null=True) # i.e. no index!
class Info(models.Model):
loyalty_adjustment = models.OneToOneField(Loyalty)
order_number = models.CharField(max_length=50, db_index=True)
...
QUERY 1:
explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT
*
FROM
"loyalty"
LEFT OUTER JOIN "info" ON ("loyalty"."id" = "info"."loyalty_adjustment_id")
WHERE ("info"."order_number" = '21072621527905'
OR "loyalty"."order_id" = 694781500)
LIMIT 21
This yields a slow query: (full scan)
Limit (cost=19.23..120.18 rows=21 width=220) (actual time=53814.148..77814.842 rows=1 loops=1)"
-> Hash Left Join (cost=19.23..2858123.35 rows=594498 width=220) (actual time=53814.147..77814.840 rows=1 loops=1)"
Hash Cond: (loyalty.id = info.info_id)"
Filter: (((info.order_number)::text = '21072621527905'::text) OR (loyalty.order_id = 694781500))"
Rows Removed by Filter: 118934642"
-> Seq Scan on loyalty (cost=0.00..2412225.44 rows=118899344 width=50) (actual time=1.001..59578.218 rows=118934643 loops=1)"
-> Hash (cost=14.10..14.10 rows=410 width=170) (actual time=0.508..0.508 rows=4 loops=1)"
Buckets: 1024 Batches: 1 Memory Usage: 9kB"
-> Seq Scan on info (cost=0.00..14.10 rows=410 width=170) (actual time=0.500..0.500 rows=4 loops=1)"
Planning time: 1.185 ms"
Execution time: 77814.890 ms"
Separating the query into 2 without OR clause makes it much much faster < 1 sec
QUERY 2:
explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT
*
FROM
"loyalty"
LEFT OUTER JOIN "info" ON ("loyalty"."id" = "info"."loyalty_adjustment_id")
WHERE
"info"."order_number" = '21072620001657'
LIMIT 21
AND
QUERY 3:
explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT
*
FROM
"loyalty"
LEFT OUTER JOIN "info" ON ("loyalty"."id" = "info"."loyalty_adjustment_id")
WHERE ("info"."order_number" = '21072620001657'
AND "loyalty"."order_id" = 4967472)
LIMIT 21
Why is having OR clause makes it much much slower than two separate query with union? Does it have to do anything with having condition on the BOTH table?
Why is it running INDEX scan on QUERY 3? Because LOYALTY table does not have ORDER as its index
QUERY 2, even when it does INDEX scan, why is it not doing anything better than full index scan if the condition does state what the index value is (which is a single order_number?)