Seems NOT IN Reduces the Performance of a Postgresql Query

Question

I have attached here one query analyze the result, https://explain.depesz.com/s/x9BN

This is the query

EXPLAIN ANALYZE 
     SELECT 
          branches.id as branch_id, 
          date(products.created_at + interval '1 hours 0 minutes') as date, 
          SUM(total_amount) as totalamount 
     FROM "products" 
     INNER JOIN "branches" ON "branches"."id" = "products"."branch_id" 
     WHERE (products.order_status_id NOT IN (10, 5, 50)) 
     AND (products.company_id = 190) 
     AND (
          products.created_at 
          BETWEEN '2019-01-30 23:00:00.000000' 
          AND '2019-12-30 23:00:00.000000'
     ) 
     GROUP BY branches.id, date;

where I can see more time takes in the usage of NOT IN().

Is there any other function we can use to increase the performance.

And what is the cardinality/selectivity of `products.order_status_id` ? Is it a foreign key to some status-table? — wildplasser, Feb 02 '20 at 15:02

score 3 · Answer 1 · answered Feb 02 '20 at 13:46

3

Your time is spent visiting the many table blocks that contain the rows you search for. Perhaps many of the blocks are not cached, and the storage isn't fast.

This query will probably never be fast as lightening, but you can try these two things:

Use a multi-column index:

CREATE INDEX ON products (company_id, creazed_at);

If that is not fast enough, rewrite the table so that the blocks are physically arranged in index order:
```
CLUSTER products USING idx_products_company_and_branch;
```
Then the required rows are concentrated in fewer table blocks, so reading them should be faster.

Note that CLUSTER blocks all access to the table while it is running, and that it has to be repeated from time to time, since the ordering degrades as the table is modified.

answered Feb 02 '20 at 13:46

Laurenz Albe

209,280
17
206
263

Also any idea of using this WHERE (products.order_status_id = ANY('{1,2,3,4,6,7,8,11,12}'::integer[]) instead of NOT IN, @Laurenz Albe – Developer Feb 02 '20 at 17:20
`= ANY` is identical to `IN`, in fact the parser translates the latter to the former. – Laurenz Albe Feb 02 '20 at 17:34
I got this https://stackoverflow.com/questions/14987321/postgresql-in-operator-with-subquery-poor-performance, will this help in any way @ Laurenz Albe ? – Developer Feb 02 '20 at 17:57
1

No, you don't have a subquery. Did you read my answer and try the things suggested there? – Laurenz Albe Feb 02 '20 at 18:06
Thanks for the suggestion, but that table already have some index, also millions of records that have so if I add more index, that will affect the write also, so let me think about that, thanks again @Laurenz Albe – Developer Feb 02 '20 at 18:27
There is of course the eternal trade-off between query performance and DML speed. – Laurenz Albe Feb 02 '20 at 18:49

Gordon Linoff · Answer 2 · 2020-02-03T19:10:06.647

2

I doubt not in with constants is cause your performance issue directly. If you see an issue, it is because the NOT IN is changing the query plan.

In your case, you have some partially helpful indexes that Postgres is trying to use. I suspect it is coming up with the wrong query plan because the statistics are out-of-date.

For this query:

SELECT b.id as branch_id, 
       date(p.created_at + interval '1 hours 0 minutes') as date, 
       SUM(total_amount) as totalamount 
 FROM "products" p JOIN
      "branches" b
      ON b."id" = p."branch_id" 
 WHERE p.order_status_id NOT IN (10, 5, 50) AND 
       p.company_id = 190 AND 
       p.created_at >= '2019-01-30 23:00:00.000000' AND
       p.created_at < '2019-12-30 23:00:00.000000'
 GROUP BY b.id, date;

Note that I tweaked the date comparisons, because the between includes both end points.

I would recommend the following indexes:

products(company_id, created_at, order_status_id)
branches(id).

edited Feb 03 '20 at 19:10

answered Feb 02 '20 at 13:45

Gordon Linoff

1,242,037
58
646
786

1

The third column of your first index will not be used for the scan, it makes the index bigger (=slower), and the condition filters only few rows. To your second index: indexes don't help with hash joins. But maybe you have something else in mind that I don't see. – Laurenz Albe Feb 02 '20 at 13:50
I prefer not to use the index because products is the main table that has millions of records, so already have index for order_status_id, company_id, but yes not for multi column. – Developer Feb 02 '20 at 17:13
Also any idea of using this WHERE (products.order_status_id = ANY('{1,2,3,4,6,7,8,11,12}'::integer[]) instead of NOT IN, @Gordon Linoff – Developer Feb 02 '20 at 17:21
@Developer . . . You can try it. However, I think the issue is the effect of the index on the execution plan, so I don't think an alternative way of phrasing the same filtering conditions will change the execution plan. – Gordon Linoff Feb 03 '20 at 19:11

Seems NOT IN Reduces the Performance of a Postgresql Query

2 Answers2