PostgreSQL using slow index when no data is available

Question

I have the following table structure:

CREATE TABLE openiot.medidas_analizadores 
(
    id int4 NOT NULL,
    datetime timestamptz NOT NULL,
    (21 other data columns...)
    e_ap_tot float8 NULL,

    CONSTRAINT medidas_analizadores_pk 
        PRIMARY KEY (id, datetime),
    CONSTRAINT medidas_analizadores_fk 
        FOREIGN KEY (id) REFERENCES openiot.analizadores(id) 
             ON UPDATE CASCADE ON DELETE RESTRICT
);

CREATE INDEX medidas_analizadores_datetime_idx 
    ON openiot.medidas_analizadores USING btree (datetime);

CREATE INDEX medidas_analizadores_id_idx 
    ON openiot.medidas_analizadores USING btree (id, datetime DESC);

This table has a lot of rows, but they are not equally distributed for each device (the 'id' column relates to a specific device id)

select id, COUNT(*)
from openiot.medidas_analizadores ma
group by 1
order by 1 DESC

27  475644
26  802809
25  1421407
24  1274758
23  1273051
20  820805
19  713100
17  498619
16  1448898
14  1948369
13  1442262
12  182
11  765829
10  512966
9   1465224
8   348758
7   1921702
6   1140484
5   1245609
4   1700527
3   1830286
2   1682362
1   1894573

I need a query to get the last measurements for a specific device. I use the following:

select *
from openiot.medidas_analizadores ma
where id = 10
order by datetime desc
limit 1

This is usually quite fast, for example:

EXPLAIN analyze verbose
select *
from openiot.medidas_analizadores ma
where id = 10
order by datetime desc
limit 1

Limit  (cost=0.56..3.10 rows=1 width=88) (actual time=0.278..0.279 rows=1 loops=1)
  Output: id, datetime, v_ab, v_bc, v_ca, v_a, v_b, v_c, i_a, i_b, i_c, i_n, freq, cos, pac, qac, sac, e_act_tot, e_ap_tot
  ->  Index Scan using medidas_analizadores_id_idx on openiot.medidas_analizadores ma  (cost=0.56..1201060.04 rows=472677 width=88) (actual time=0.277..0.277 rows=1 loops=1)
        Output: id, datetime, v_ab, v_bc, v_ca, v_a, v_b, v_c, i_a, i_b, i_c, i_n, freq, cos, pac, qac, sac, e_act_tot, e_ap_tot
        Index Cond: (ma.id = 10)
Planning Time: 0.856 ms
Execution Time: 0.315 ms

However if I try the same query for a device which doesn't have data (or has very old data) the query is veery slow.

EXPLAIN analyze verbose
select *
from openiot.medidas_analizadores ma
where id = 11
order by datetime desc
limit 1

Limit  (cost=0.44..2.13 rows=1 width=88) (actual time=94443.983..94443.985 rows=1 loops=1)
  Output: id, datetime, v_ab, v_bc, v_ca, v_a, v_b, v_c, i_a, i_b, i_c, i_n, freq, cos, pac, qac, sac, e_act_tot, e_ap_tot
  ->  Index Scan Backward using medidas_analizadores_datetime_idx on openiot.medidas_analizadores ma  (cost=0.44..1307932.23 rows=774460 width=88) (actual time=94443.981..94443.981 rows=1 loops=1)
        Output: id, datetime, v_ab, v_bc, v_ca, v_a, v_b, v_c, i_a, i_b, i_c, i_n, freq, cos, pac, qac, sac, e_act_tot, e_ap_tot
        Filter: (ma.id = 11)
        Rows Removed by Filter: 10309316
Planning Time: 0.175 ms
Execution Time: 94444.033 ms

Why is this the case, and how to solve this? I have tried different indexes without success so far, and I think this problem is related to the selectivity of the 'id' column and could be solve using statistics (https://www.postgresql.org/docs/current/sql-createstatistics.html) but I am not sure how to apply them.

"doesn't have data (or has very old data) " Those are different cases. Which one applies here? — jjanes, Jul 11 '22 at 16:32
It *should* be doing the right thing automatically. What plan do you get if you drop the single-column index on datetime? — jjanes, Jul 11 '22 at 16:36
Can you temporarily drop the index on (datetime) and get the plan of the query again? The question is... why is the optimizer choosing the bad index, if other plan(s) can have lower cost? Trying this out will tell us how the "second best plan" looks like. — The Impaler, Jul 11 '22 at 17:13
Did you update the stats after adding rows? It seems the historgrams can be off. — The Impaler, Jul 11 '22 at 17:17
@TheImpaler you mean ANALYZE the table? I did that, but no changes. — José D., Jul 11 '22 at 20:08
@TheImpaler dropping the index on datetime does fix it, but I still need that index for other queries. — José D., Jul 11 '22 at 20:16
"dropping the index on datetime does fix it" But we need to see the plan for it — jjanes, Jul 12 '22 at 02:23
There is clearly some data for it, as it did actually return one row. We don't know how many more there are, as it stopped early due to the LIMIT after that one. — jjanes, Jul 12 '22 at 02:30

PostgreSQL using slow index when no data is available

0 Answers0