We have a db (9.6) that contains measurement data. The relevant query regards 3 tables:
- aufnehmer (i.e. sensor), 5e+2 entries,
- zeitpunkt (i.e. point in time), 4e+6 entries
- wert (i.e. value), 6e+8 entries
aufnehmer : zeitpunkt = m : n with wert as the mapping table. All relevant columns are indexed.
The following query
select count(*) from wert w
inner join aufnehmer a on w.aufnehmer_id = a.id
inner join zeitpunkt z on z.id = w.zeitpunkt_id
where a.id = 12749
and z.zeitpunkt <= ('2018-05-07')::timestamp without time zone
and z.zeitpunkt >= ('2018-05-01')::timestamp without time zone;
Aggregate (cost=3429124.66..3429124.67 rows=1 width=8) (actual time=66.252..66.252 rows=1 loops=1)
-> Nested Loop (cost=571.52..3429084.29 rows=16149 width=0) (actual time=19.051..65.406 rows=15942 loops=1)
-> Index Only Scan using idx_aufnehmer_id on aufnehmer a (cost=0.28..8.29 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: (id = 12749)
Heap Fetches: 1
-> Nested Loop (cost=571.24..3428914.50 rows=16149 width=4) (actual time=19.040..64.502 rows=15942 loops=1)
-> Bitmap Heap Scan on zeitpunkt z (cost=570.67..22710.60 rows=26755 width=4) (actual time=1.551..3.407 rows=24566 loops=1)
Recheck Cond: ((zeitpunkt <= '2018-05-07 00:00:00'::timestamp without time zone) AND (zeitpunkt >= '2018-05-01 00:00:00'::timestamp without time zone))
Heap Blocks: exact=135
-> Bitmap Index Scan on idx_zeitpunkt_zeitpunkt_order_desc (cost=0.00..563.98 rows=26755 width=0) (actual time=1.527..1.527 rows=24566 loops=1)
Index Cond: ((zeitpunkt <= '2018-05-07 00:00:00'::timestamp without time zone) AND (zeitpunkt >= '2018-05-01 00:00:00'::timestamp without time zone))
-> Index Only Scan using uq1_wert on wert w (cost=0.57..126.94 rows=37 width=8) (actual time=0.002..0.002 rows=1 loops=24566)
Index Cond: ((aufnehmer_id = 12749) AND (zeitpunkt_id = z.id))
Heap Fetches: 15942
Planning time: 0.399 ms
Execution time: 66.339 ms
and takes about a second. When the end date is augmented by one day and the query is changed to:
... --same as above
and z.zeitpunkt <= ('2018-05-08')::timestamp without time zone
and z.zeitpunkt >= ('2018-05-01')::timestamp without time zone;
Aggregate (cost=3711151.24..3711151.25 rows=1 width=8) (actual time=35601.351..35601.351 rows=1 loops=1)
-> Nested Loop (cost=66264.74..3711104.14 rows=18840 width=0) (actual time=35348.705..35600.192 rows=17612 loops=1)
-> Index Only Scan using idx_aufnehmer_id on aufnehmer a (cost=0.28..8.29 rows=1 width=4) (actual time=0.007..0.010 rows=1 loops=1)
Index Cond: (id = 12749)
Heap Fetches: 1
-> Hash Join (cost=66264.47..3710907.45 rows=18840 width=4) (actual time=35348.693..35598.183 rows=17612 loops=1)
Hash Cond: (w.zeitpunkt_id = z.id)
-> Bitmap Heap Scan on wert w (cost=43133.18..3678947.46 rows=2304078 width=8) (actual time=912.086..35145.680 rows=2334815 loops=1)
Recheck Cond: (aufnehmer_id = 12749)
Rows Removed by Index Recheck: 205191912
Heap Blocks: exact=504397 lossy=1316875
-> Bitmap Index Scan on idx_wert_aufnehmer_id (cost=0.00..42557.16 rows=2304078 width=0) (actual time=744.144..744.144 rows=2334815 loops=1)
Index Cond: (aufnehmer_id = 12749)
-> Hash (cost=22741.12..22741.12 rows=31214 width=4) (actual time=8.909..8.909 rows=27675 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1229kB
-> Bitmap Heap Scan on zeitpunkt z (cost=664.37..22741.12 rows=31214 width=4) (actual time=1.822..5.600 rows=27675 loops=1)
Recheck Cond: ((zeitpunkt <= '2018-05-08 00:00:00'::timestamp without time zone) AND (zeitpunkt >= '2018-05-01 00:00:00'::timestamp without time zone))
Heap Blocks: exact=152
-> Bitmap Index Scan on idx_zeitpunkt_zeitpunkt_order_desc (cost=0.00..656.57 rows=31214 width=0) (actual time=1.798..1.798 rows=27675 loops=1)
Index Cond: ((zeitpunkt <= '2018-05-08 00:00:00'::timestamp without time zone) AND (zeitpunkt >= '2018-05-01 00:00:00'::timestamp without time zone))
Planning time: 0.404 ms
Execution time: 35608.286 ms
and the execution takes about 1000 times longer.
So it seems that the query planner switches to joining aufnehmer and wert first which takes much longer than joining zeitpunkt and wert first.
Any idea if it can be forced to keep the first execution plan? We have already increased work_mem but it did not make any difference.