PostgreSQL - partition-wise join is not used

Question

I have problem with not using partition-wise join by PostgreSQL 15.2.

I use 2 tables range-partitioned by the same key: date_id. Partition ranges for both tables are equal, except one of them (date_dim) has much more partions.

Table DDL:

CREATE TABLE date_dim
(
    date_id date NOT NULL,
    month_key integer NOT NULL,
    the_month smallint NOT NULL,
    quarter_key smallint,
    the_quarter smallint NOT NULL,
    the_year smallint NOT NULL,
    day_in_month smallint NOT NULL,
    day_in_year smallint NOT NULL,
    week_in_year smallint NOT NULL,
    day_pl text COLLATE pg_catalog."default" NOT NULL,
    day_en text COLLATE pg_catalog."default" NOT NULL,
    month_pl text COLLATE pg_catalog."default" NOT NULL,
    month_en text COLLATE pg_catalog."default" NOT NULL,
    week_number smallint NOT NULL DEFAULT 0,
    week_of_the_month smallint,
    is_holiday smallint,
    CONSTRAINT date_dim_pk PRIMARY KEY (date_id)
) PARTITION BY RANGE (date_id);

CREATE INDEX date_dim_date_id_idx ON date_dim USING btree (date_id ASC NULLS LAST);
CREATE INDEX date_dim_month_key_idx ON date_dim USING btree (month_key ASC NULLS LAST);

CREATE TABLE status_change_fact
(
    status_change_dim_id bigint,
    status_change_id integer,
    date_id date,
    project_dim_id bigint,
    customer_dim_id bigint,
    is_talk smallint,
    downloaded_unique text COLLATE pg_catalog."default",
    contacted_unique text COLLATE pg_catalog."default",
    successes_unique text COLLATE pg_catalog."default",
    closed_unique text COLLATE pg_catalog."default",
    downloaded_nonunique smallint,
    contacted_nonunique smallint,
    successes_nonunique smallint,
    closed_nonunique smallint,
    dw_source_id integer,
    count smallint DEFAULT 1,
    call_junk_dim_id bigint,
    closed_customer smallint
) PARTITION BY RANGE (date_id);

CREATE INDEX status_change_fact_date_id_project_dim_id_idx ON status_change_fact USING btree (date_id ASC NULLS LAST, project_dim_id ASC NULLS LAST);
CREATE INDEX status_change_fact_status_change_dim_id_date_id_idx ON status_change_fact USING btree (status_change_dim_id ASC NULLS LAST, date_id ASC NULLS LAST);

Most important partitions:

CREATE TABLE date_dim_202302 PARTITION OF date_dim FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
CREATE TABLE status_change_fact_202302 PARTITION OF status_change_fact FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');

Simple query with join between both tables on partition key and filter on date_id using current_date function:

explain analyze
select *
FROM date_dim dd
left JOIN status_change_fact sf
    on dd.date_id = sf.date_id 
where dd.date_id BETWEEN 
    ((current_date + (interval '1' day * floor(-3))) - ( EXTRACT( DAY FROM (current_date + (interval '1' day * floor(-3))) ) * INTERVAL '1 DAY' ) + INTERVAL '1 DAY') AND 
    ((current_date + (interval '1' day * floor(-3))) - ( EXTRACT( DAY FROM (current_date + (interval '1' day * floor(-3))) ) * INTERVAL '1 DAY' ) + INTERVAL '1 DAY' + INTERVAL '1 MONTH' - INTERVAL '1 DAY');

where filter means between beginning of month which was 3 days ago and end of month which was 3 days ago (ie. whole February 2023 at the day of executing).

Explain results:

Merge Left Join  (cost=73.95..2419690.04 rows=93583886 width=283) (actual time=10767.681..12547.594 rows=3226419 loops=1)
  Merge Cond: (dd.date_id = sf.date_id)
  ->  Append  (cost=68.93..1238.08 rows=698 width=156) (actual time=0.029..0.052 rows=28 loops=1)
        Subplans Removed: 348
        ->  Index Scan using date_dim_202302_date_id_idx on date_dim_202302 dd_1  (cost=0.20..3.54 rows=2 width=156) (actual time=0.022..0.040 rows=28 loops=1)
              Index Cond: ((date_id >= (((CURRENT_DATE + '-3 days'::interval) - ((EXTRACT(day FROM (CURRENT_DATE + '-3 days'::interval)))::double precision * '1 day'::interval)) + '1 day'::interval)) AND (date_id <= (((((CURRENT_DATE + '-3 days'::interval) - ((EXTRACT(day FROM (CURRENT_DATE + '-3 days'::interval)))::double precision * '1 day'::interval)) + '1 day'::interval) + '1 mon'::interval) - '1 day'::interval)))
  ->  Materialize  (cost=5.02..1081729.10 rows=26814867 width=127) (actual time=0.012..10158.205 rows=26814487 loops=1)
        ->  Append  (cost=5.02..1014691.93 rows=26814867 width=127) (actual time=0.010..7349.431 rows=26814487 loops=1)
              ->  Index Scan using status_change_fact_202201_date_id_project_dim_id_idx on status_change_fact_202201 sf_1  (cost=0.29..1950.21 rows=56988 width=130) (actual time=0.009..8.610 rows=56988 loops=1)
              ->  Index Scan using status_change_fact_202202_date_id_project_dim_id_idx on status_change_fact_202202 sf_2  (cost=0.28..301.32 rows=8117 width=130) (actual time=0.009..1.732 rows=8117 loops=1)
              ->  Index Scan using status_change_fact_202203_date_id_project_dim_id_idx on status_change_fact_202203 sf_3  (cost=0.28..223.19 rows=6143 width=130) (actual time=0.012..1.479 rows=6143 loops=1)
              ->  Index Scan using status_change_fact_202204_date_id_project_dim_id_idx on status_change_fact_202204 sf_4  (cost=0.29..658.95 rows=19063 width=130) (actual time=0.008..3.723 rows=19063 loops=1)
              ->  Index Scan using status_change_fact_202205_date_id_project_dim_id_idx on status_change_fact_202205 sf_5  (cost=0.29..356.18 rows=9941 width=130) (actual time=0.010..1.219 rows=9941 loops=1)
              ->  Index Scan using status_change_fact_202206_date_id_project_dim_id_idx on status_change_fact_202206 sf_6  (cost=0.29..1036.62 rows=30427 width=130) (actual time=0.006..3.503 rows=30427 loops=1)
              ->  Index Scan using status_change_fact_202207_date_id_project_dim_id_idx on status_change_fact_202207 sf_7  (cost=0.29..1034.62 rows=30809 width=130) (actual time=0.009..3.304 rows=30809 loops=1)
              ->  Index Scan using status_change_fact_202208_date_id_project_dim_id_idx on status_change_fact_202208 sf_8  (cost=0.28..193.90 rows=5356 width=130) (actual time=0.011..0.665 rows=5356 loops=1)
              ->  Index Scan using status_change_fact_202209_date_id_project_dim_id_idx on status_change_fact_202209 sf_9  (cost=0.43..148748.78 rows=4520518 width=130) (actual time=0.012..991.617 rows=4520518 loops=1)
              ->  Index Scan using status_change_fact_202210_date_id_project_dim_id_idx on status_change_fact_202210 sf_10  (cost=0.43..135721.97 rows=4112554 width=130) (actual time=0.015..1171.379 rows=4112554 loops=1)
              ->  Index Scan using status_change_fact_202211_date_id_project_dim_id_idx on status_change_fact_202211 sf_11  (cost=0.43..213329.82 rows=6588362 width=120) (actual time=0.012..1326.248 rows=6588362 loops=1)
              ->  Index Scan using status_change_fact_202212_date_id_project_dim_id_idx on status_change_fact_202212 sf_12  (cost=0.43..143825.93 rows=4350148 width=128) (actual time=0.010..1113.317 rows=4350148 loops=1)
              ->  Index Scan using status_change_fact_202301_date_id_project_dim_id_idx on status_change_fact_202301 sf_13  (cost=0.43..127917.02 rows=3849646 width=130) (actual time=0.011..885.968 rows=3849646 loops=1)
              ->  Index Scan using status_change_fact_202302_date_id_project_dim_id_idx on status_change_fact_202302 sf_14  (cost=0.43..105301.15 rows=3226415 width=130) (actual time=0.010..327.165 rows=3226415 loops=1)
              ->  Index Scan using status_change_fact_202303_date_id_project_dim_id_idx on status_change_fact_202303 sf_15  (cost=0.15..17.95 rows=380 width=186) (actual time=0.006..0.006 rows=0 loops=1)
Planning Time: 36.252 ms
Execution Time: 12674.066 ms

As you can see, PG uses Index Scan with no filters to read all status_change_fact partitions, but uses partition pruning for date_dim and read only one partition for february 2023.

Moreover PG does not even use proper index scan with filter on date_id which would limit table scans to only one partition of status_change_fact.

When current_date function is replaced by literal date:

explain analyze
select *
FROM date_dim dd
left JOIN status_change_fact sf
    on dd.date_id = sf.date_id 
where dd.date_id BETWEEN 
    (('2023-03-01'::date + (interval '1' day * floor(-3))) - ( EXTRACT( DAY FROM ('2023-03-01'::date + (interval '1' day * floor(-3))) ) * INTERVAL '1 DAY' ) + INTERVAL '1 DAY') AND 
    (('2023-03-01'::date + (interval '1' day * floor(-3))) - ( EXTRACT( DAY FROM ('2023-03-01'::date + (interval '1' day * floor(-3))) ) * INTERVAL '1 DAY' ) + INTERVAL '1 DAY' + INTERVAL '1 MONTH' - INTERVAL '1 DAY');

both tables are correctly pruned and partition-wise joined and the only one used partition of status_change_fact is even index filtered by date_id:

Nested Loop Left Join  (cost=0.58..38401.46 rows=14666 width=283) (actual time=0.043..1070.607 rows=3226419 loops=1)
  ->  Index Scan using date_dim_202302_date_id_idx on date_dim_202302 dd  (cost=0.15..3.49 rows=2 width=156) (actual time=0.021..0.034 rows=28 loops=1)
        Index Cond: ((date_id >= '2023-02-01 00:00:00'::timestamp without time zone) AND (date_id <= '2023-02-28 00:00:00'::timestamp without time zone))
  ->  Index Scan using status_change_fact_202302_date_id_project_dim_id_idx on status_change_fact_202302 sf  (cost=0.43..17854.65 rows=134434 width=130) (actual time=0.017..12.107 rows=115229 loops=28)
        Index Cond: (date_id = dd.date_id)
Planning Time: 0.695 ms
Execution Time: 1162.486 ms

Execution time difference is also clrearly visible (12s vs 1s).

I know that current_date function is user time zone dependent, but according to documentation https://www.postgresql.org/docs/current/xfunc-volatility.html, it should be marked as stable, thus could be inlined and it looks like it is for date_dim at least. But PG knowing that only one partion of date_dim is read and the partitions of tables are partitioned by the same range could use this fact to prune status_change_fact too.

My question is: why does PostgreSQL do pruning only for date_dim in case of current_date usage, while pruning and partition-wise joining both tables when literal is used and how can we convince PostgreSQL to do it with current_date function? The query is generated by BI tool and we can not directly affect it so current_date has to stay.

Why did you partition date_dim? 3000 years results in just a million records. — Frank Heikens, Mar 01 '23 at 11:04
Not sure if it is necessary,but I made a unpartitoned copy of date_dim to test again and only change in plan is changing index scans on status_change_fact to seq scan, so no gain, even worse for literal date (https://explain.depesz.com/s/ojyn with current_date, https://explain.depesz.com/s/7pAU for literal,where stopped pruning status_change_fact). — Shr, Mar 01 '23 at 11:37
The obvious question: did you configure `enable_partitionwise_join = on`? — Laurenz Albe, Mar 01 '23 at 11:49
Hi, enable_partitionwise_join was set to on, I forgot to mention it. — Shr, Mar 01 '23 at 12:10

PostgreSQL - partition-wise join is not used

0 Answers0