PostgreSQL version: 9.3.13
Consider the following tables, index and data:
CREATE TABLE orders (
order_id bigint,
status smallint,
owner int,
CONSTRAINT orders_pkey PRIMARY KEY (order_id)
)
CREATE INDEX owner_index ON orders
USING btree
(owner) WHERE status > 0;
CREATE TABLE orders_appendix (
order_id bigint,
note text
)
Data
orders:
(IDs, 0, 1337) * 1000000 rows
(IDs, 10, 1337) * 1000 rows
(IDs, 10, 777) * 1000 rows
orders_appendix:
- one row for each order
My problem is:
select * from orders where owner=1337 and status>0
The query planner estimated the number of row to be 1000000, but actual number of row is 1000.
In a more complicated following query:
SELECT note FROM orders JOIN orders_appendix using (order_id)
WHERE owner=1337 AND status>0
Instead of using inner join (which is preferable for small number of rows), it picks bitmap join + a full table scan on orders_appendix, which is very slow.
If the condition is "owner=777", it will choose the preferable inner join instead.
I believe it is bacause of the statistics, as AFAIK postgres can only collect and consider stats for each column independently.
However, if I...
CREATE INDEX onwer_abs ON orders (abs(owner)) where status>0;
Now, a slightly changed query...
SELECT note FROM orders JOIN rders_appendix using (order_id)
WHERE abs(owner)=1337 AND status>0
will results in the inner join that I wanted.
Is there a better solution? Perhaps "statistics on partial index"?