2

In below example, I am joining two identical tables on two columns:

create table a (id int,txt text);
create table b (id int,txt text);
insert into a select *,* from generate_series(1,40000);
insert into b select *,* from generate_series(1,40000);
analyze a;
analyze b;
explain analyze 
select * from a inner join b on a.id = b.id and a.txt = b.txt;

In the explain plan you can see that it underestimates the number of rows that come out of the join by ~40.000. It thinks 1 row comes out instead of 40.000. In my real world example, on which this theoretical example is based on, this is a problem as this gross incorrect estimation on number of rows causes bad execution plans of bigger queries where this join is part of:

Hash Join  (... rows=1 ...) (actual ... rows=40000 ...) 

So clearly the planner does not know that for each row in table a it will find a row in table b. Clear, how should it? Two fixes come to mind:

(A) Left Join

Using a left join we can correct the estimates:

explain analyze 
select * from a LEFT join b on a.id = b.id and a.txt = b.txt;

We can see the estimates are correct now:

Hash Left Join  (... rows=40000 ...) (actual ... rows=40000 ...)     

(B) Foreign Key

Using a foreign key we can correct the estimates as well:

CREATE UNIQUE INDEX unq_b ON b USING btree (id,txt);

alter table a add constraint fk_a foreign key (id,txt) references b (id,txt);

explain analyze 
select * from a inner join b on a.id = b.id and a.txt = b.txt;

We can see the estimates are correct now:

Hash Join  (... rows=40000 ...) (actual ... rows=40000 ...)

I neither want to make the join a left join, as I can not guarantee that the query results will then 100% be the same as before in all edge cases. Nor do I want to introduce the FK, as the program does inserts into the tables in a variety of orders and I would have to change the application.

Can you think of other ways to tell the planner about this special relation of these two tables? Maybe a particular way of writing the query? Or some kind of statistics object? Any ideas?

TYVM!

This was tested on two versions:

PostgreSQL 12.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
PostgreSQL 14.6, compiled by Visual C++ build 1914, 64-bit

UPDATE - Example why this misestimation is a problem:

In my real world example it is problematic that postgres thinks that 1 row comes out of the join when in reality 40.000 rows come out. This is because it then decides to do a nested loop of the 1 row (which in reality is 40.000 rows) with a FTS on a big table - so 40.000 FTS on a big table:

create table c (id int,txt text);

insert into c select *,* from generate_series(1,40000000);

analyze c;

SET max_parallel_workers_per_gather = 0;

set join_collapse_limit = 1;

explain
with a_b as (
    select a.id a_id,b.id b_id,a.txt a_txt,b.txt b_txt 
    from a inner join b 
    on a.id = b.id and a.txt = b.txt
)
select * from a_b inner join c 
on a_b.a_id = c.id and a_b.b_txt = c.txt and a_b.b_id = c.id and a_b.a_txt = c.id::text;

Which is 40.000 FTS of table c:

QUERY PLAN                                                             |
-----------------------------------------------------------------------+
Nested Loop  (cost=1216.00..921352.51 rows=1 width=30)                 |
  Join Filter: ((a.id = c.id) AND (a.txt = c.txt))                     |
  ->  Hash Join  (cost=1216.00..2132.01 rows=1 width=18)               |
        Hash Cond: ((a.id = b.id) AND (a.txt = b.txt))                 |
        ->  Seq Scan on a  (cost=0.00..616.00 rows=40000 width=9)      |
        ->  Hash  (cost=616.00..616.00 rows=40000 width=9)             |
              ->  Seq Scan on b  (cost=0.00..616.00 rows=40000 width=9)|
  ->  Seq Scan on c  (cost=0.00..916220.48 rows=200001 width=12)       |
        Filter: (txt = (id)::text)                                     |

Interestingly the left join trick does not even work here, only the FK corrects the estimations and therefore the plan:

/* left join trick not working*/ 

explain
with a_b as (
    select a.id a_id,b.id b_id,a.txt a_txt,b.txt b_txt 
    from a LEFT join b 
    on a.id = b.id and a.txt = b.txt
)
select * from a_b inner join c 
on a_b.a_id = c.id and a_b.b_txt = c.txt and a_b.b_id = c.id and a_b.a_txt = c.id::text;

/*QUERY PLAN                                                           |
-----------------------------------------------------------------------+
Nested Loop  (cost=1216.00..921352.51 rows=1 width=30)                 |
  Join Filter: ((a.id = c.id) AND (a.txt = c.txt))                     |
  ->  Hash Join  (cost=1216.00..2132.01 rows=1 width=18)               |
        Hash Cond: ((a.id = b.id) AND (a.txt = b.txt))                 |
        ->  Seq Scan on a  (cost=0.00..616.00 rows=40000 width=9)      |
        ->  Hash  (cost=616.00..616.00 rows=40000 width=9)             |
              ->  Seq Scan on b  (cost=0.00..616.00 rows=40000 width=9)|
  ->  Seq Scan on c  (cost=0.00..916220.48 rows=200001 width=12)       |
        Filter: (txt = (id)::text)                                     |*/

/* with the FK the plan is correct */ 

CREATE UNIQUE INDEX unq_b ON b USING btree (id,txt);
alter table a add constraint fk_a foreign key (id,txt) references b (id,txt); 
        
explain
with a_b as (
    select a.id a_id,b.id b_id,a.txt a_txt,b.txt b_txt 
    from a join b 
    on a.id = b.id and a.txt = b.txt
)
select * from a_b inner join c 
on a_b.a_id = c.id and a_b.b_txt = c.txt and a_b.b_id = c.id and a_b.a_txt = c.id::text;

/*QUERY PLAN                                                                   |
-----------------------------------------------------------------------------+
Hash Join  (cost=2642.00..920362.50 rows=1 width=30)                         |
  Hash Cond: ((c.id = a.id) AND (c.txt = a.txt))                             |
  ->  Seq Scan on c  (cost=0.00..916220.48 rows=200001 width=12)             |
        Filter: (txt = (id)::text)                                           |
  ->  Hash  (cost=2042.00..2042.00 rows=40000 width=18)                      |
        ->  Hash Join  (cost=1216.00..2042.00 rows=40000 width=18)           |
              Hash Cond: ((a.id = b.id) AND (a.txt = b.txt))                 |
              ->  Seq Scan on a  (cost=0.00..616.00 rows=40000 width=9)      |
              ->  Hash  (cost=616.00..616.00 rows=40000 width=9)             |
                    ->  Seq Scan on b  (cost=0.00..616.00 rows=40000 width=9)|*/

Screenshot of the execution plan of the real world example that this sample is based on (green arrows show the problem). Note that the real world example has the 1:1 problem 2 times in a row (2 FKs would solve it here):

enter image description here

Peter
  • 932
  • 5
  • 19
  • You can create an index on a table without also creating a foreign key constraint. Indexes are the usual way of addressing this sort of optimization problem. Have you tried that? – O. Jones Nov 11 '22 at 10:54
  • 2
    If there really is a foreign key dependency between the two tables, that should be ensured through a foreign key constraint. In my opinion the code should be fixed to be able to deal with the FK. If that really isn't possible, then you could create the FK as deferrable to work around the bugs in the code. –  Nov 11 '22 at 10:58
  • @O.Jones TY. Yes, I have tried various indexes and statistics, but none worked. – Peter Nov 11 '22 at 11:01
  • I did a fiddle. [here](https://dbfiddle.uk/Uhq6rC9W), with several different permutations of indexes, constraints, and dbms versions. For what it's worth the estimated cost and actual time don't change much, even though the estimated row count for the top-level hash join is sometimes 1 and sometimes 40000. It can be surprisingly difficult to work out performance problems in big queries by trying to do so in smaller queries. You might show us your bigger query. – O. Jones Nov 11 '22 at 11:07
  • @O.Jones Thank you for your feedback. I do not feel confident to share the original query, but I have adapted my question to include the part of the problem where the wrong estimation leads to 40.000 FTS. – Peter Nov 11 '22 at 11:44
  • Please add a `PRIMARY KEY` constraint to the `id` columns, and try again. – The Impaler Nov 11 '22 at 14:41
  • @TheImpaler Unfortunately neither `alter table a add constraint pk_a_1 primary key (id); alter table b add constraint pk_b_1 primary key (id);` nor `alter table a add constraint pk_a_1 primary key (id,txt); alter table b add constraint pk_b_1 primary key (id,txt);` correct the estimate, but that you for the suggestion. – Peter Nov 14 '22 at 08:20

1 Answers1

1

There are no cross-table statistics in PostgreSQL, so you won't be able to fix that bad estimate. If that is part of a bigger query and the misestimate causes a problem, you could split the query in two parts: first calculate the subquery with the bad estimate and populate a temporary table with it, then ANALYZE that temporary table to make sure the estimates are correct, then use the temporary table with the rest of your query.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • The funny thing is how random/flip-flop this seems to be. When I join on one column it estimates 40.000 rows to come out: `select * from a inner join b on a.txt = b.txt` When I join two columns, it estimates 1 row to come out, as shown above. And when I use concat, it estimates 8.000.000 rows to come out: `select * from a inner join b on concat(a.id,a.txt) = concat(b.id,b.txt)` – Peter Nov 15 '22 at 12:31
  • 1
    That's because PostgreSQL treats all join conditions as if they were statistically independent. So it multiplies the selectivity: if it thinks that each join condition would filter out nine tenths of the cross product to leave only one tenth, it will estimate that both conditions together will only leave one hundredth of the cross product. – Laurenz Albe Nov 15 '22 at 16:08
  • TYVM @laurenz-albe! Doing the calculation for what you are saying: When it estimates 40k rows for a.id = b.id then it says "I expect 0.0025% of the cross-product (40k*40k = 1.6billion). And when it estimates 40k rows for a.txt=b.txt, it again estimates 0.0025%. So with both conditions in place at the same time it does 0.000025*0.000025 = 0.000000000625. And 0.000000000625*1.6billion = 1. Funny side note: For a.id::text = b.id::text it expects 8mio rows to come out instead of the 40k for a.id = b.id. – Peter Nov 16 '22 at 10:18
  • You can see heuristics at play. – Laurenz Albe Nov 16 '22 at 11:23