When joining two tables via a composite (two column) primary key, I get bad cardinality estimates in the query plan. Example:
CREATE TABLE t1 AS SELECT x, x*2 AS x2 FROM generate_series(0, 1000) AS x;
ALTER TABLE t1 ADD PRIMARY KEY(x, x2);
ANALYZE t1;
CREATE TABLE t2 AS SELECT x, x*2 AS x2 FROM generate_series(0, 1000) AS x;
ALTER TABLE t2 ADD FOREIGN KEY (x, x2) REFERENCES t1(x,x2);
ANALYZE t2;
EXPLAIN ANALYZE
SELECT *
FROM t1 JOIN t2 USING (x, x2)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Hash Join (cost=30.02..52.55 rows=1 width=8) (actual time=0.660..1.551 rows=1001 loops=1)
Hash Cond: ((t1.x = t2.x) AND (t1.x2 = t2.x2))
-> Seq Scan on t1 (cost=0.00..15.01 rows=1001 width=8) (actual time=0.021..0.260 rows=1001 loops=1)
-> Hash (cost=15.01..15.01 rows=1001 width=8) (actual time=0.620..0.620 rows=1001 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Seq Scan on t2 (cost=0.00..15.01 rows=1001 width=8) (actual time=0.019..0.230 rows=1001 loops=1)
Total runtime: 1.679 ms
The plan expects one returned row, but in fact 1001 rows are returned. This is not a problem in simple queries, but it leads to very slow query plans when doing complex queries. How can I help the query optimizer to do better?