1

I have table in postgres database. After the row reach 2 mio rows, the query become slower. This is my query

SELECT
c.source,
c.destination,
c.product_id,
sum(c.weight),
count(c.weight),
c.owner_id 
FROM stock c 
GROUP BY c.source, c.destination, c.product_id, c.owner_id;

I already add index

CREATE INDEX stock_custom_idx ON public.stock USING btree (source, destination, product_id, owner_id)

The query is very slow, so I do explain analyze, but the index not called.

So, how to optimize this query, because its take too long time and not return data?

  • This looks really strange to me, because the exact index you added should really be helping the query. Are you sure you ran `EXPLAIN` _after_ adding this index and not at some other point? – Tim Biegeleisen Jun 14 '21 at 09:26
  • @TimBiegeleisen sure, I already check it – Brosky Samson Putra halim Jun 14 '21 at 09:28
  • What was the size of the table when you ran `EXPLAIN`? Note that for fairly small table sizes, Postgres may choose to not use the index. The big issue here would be if you have a very large table plus the index, but Postgres is still choosing not to use that index. – Tim Biegeleisen Jun 14 '21 at 09:29
  • @TimBiegeleisen It has 2 mio rows, with 457MB of data – Brosky Samson Putra halim Jun 14 '21 at 09:33
  • 1
    Please **[edit]** your question and add the [execution plan](https://www.postgresql.org/docs/current/static/using-explain.html) generated using **`explain (analyze, buffers, format text)`** (_not_ just a "simple" explain) as [formatted text](http://stackoverflow.com/help/formatting) and make sure you preserve the indention of the plan. Paste the text, then put `\`\`\`` on the line before the plan and on a line after the plan. Please also include complete `create index` statements for all indexes as well. –  Jun 14 '21 at 10:38
  • 2
    Your query needs to read and group **all** rows of the table - no index is going to speed that up. –  Jun 14 '21 at 10:39
  • @a_horse_with_no_name An index might give speed-up because it avoids sorting at execution time, might it not? Just give the index all information needed for the query, such that the optimizer decides for an index only scan. Am I wrong? – Islingre Jun 14 '21 at 13:26
  • @Islingre: the query uses a `GROUP BY` not an `order by` as stated in the question's title. Grouping is rarely done using sorting these days. And unless the table contains a **lot** more columns, the index only scan won't really be that much faster –  Jun 14 '21 at 13:28
  • @a_horse_with_no_name Can you pls hint me how grouping is done instead? And why this would be faster than an index only scan, where you can compute the complete output with just once scanning the index? In my mind this should be the fastest, but I would be happy to learn more – Islingre Jun 14 '21 at 13:40
  • @a_horse_with_no_name But v10 isn't really "these days". Disk-based hash aggregation wasn't introduced until v13. Of course without seeing the plan, we don't know is going on. Maybe it would all fit in memory, but since he mentions a degradation at a certain size, I would say it doesn't. – jjanes Jun 14 '21 at 15:20
  • @jjanes: but even v10 typically uses hashing for a GROUP BY, not sorting - at least if `work_mem` is big enough. I guess `work_mem` is simply too small to allow in-memory processing of all rows. But without seeing the execution plan this is only guesswork –  Jun 14 '21 at 15:24

1 Answers1

1

Try this index:

CREATE INDEX better_index ON public.stock USING btree
    (source, destination, product_id, owner_id, weight);

If you do not include the weight, this information still needs to be fetched from the table, thus you have a full table scan.

With the new index, you should have an index only scan. Also, the query planner can make use of the sorting order of the index for the grouping (just as it could have done with your index).

In newer versions of PostgreSQL, there would also exist the INCLUDE clause, where you can "add" columns to an index without this having any impact on the sorting order (the data is there, but this part of the data is not sorted). This would make the index yet another bit more performant for your query, I guess.

Islingre
  • 2,030
  • 6
  • 18