11

I have this query (in postgresql):

SELECT "table_1".* FROM "table_1"
INNER JOIN "join_table"
  ON "table_1"."id" = "join_table"."table_1_id"
WHERE "join_table"."table_2_id" = 650727
ORDER BY table_1.created_at DESC
LIMIT 1

Which returns 1 result, but is taking ~250-300 ms to execute

There are btree indexes on table_1.created_at, as well as join_table.table_1_id and join_table.table_2_id

When I ONLY remove the LIMIT 1 from the query, the execution time drops down to ~13ms. This specific query currently only returns one result (without the LIMIT), but there are others with a different value in the WHERE that may return more (which is why the LIMIT is necessary).

Why is adding a LIMIT to a query that is already only returning a single result bloating the execution time so much?

Here is the explain plan with the LIMIT 1 (these are always hard for me to fully understand...): http://explain.depesz.com/s/rOy

And here is the explain plan without LIMIT 1: http://explain.depesz.com/s/q3d7

Additionally, if I keep the LIMIT 1, but change the order by to ASC, the query goes down to 13 ms also. And if I change the LIMIT to LIMIT 20 (But keep the ORDER BY DESC) it only takes 22ms... wtf!?

So it has something to do with the combination of ORDER BY DESC, and LIMIT 1 (Exactly 1)

nzifnab
  • 15,876
  • 3
  • 50
  • 65
  • It sounds like you might have index on the column you are ordering by, thus when your order matches the index it is fast and when it is opposite of the index it has to order it in memory before it is able to present the results. Can you list what indexes you have on your tables? –  May 19 '15 at 19:31
  • Did you try looking at the other performance problems on queries with limits for postgres on stackoverflow? There are a lot of topics, maybe those help. – Juru May 19 '15 at 19:33
  • Do you have 3 indices or 2 with one "composite index" that contains "join_table.table_1_id" as well as "join_table.table_2_id"? With a composite index the join-filtering could be handled entirely by that index. E.g.: create index join_table_ix1 on join_table (table_2_id, table_1_id); – Thomas Darimont May 19 '15 at 20:28

2 Answers2

16

Ok, this is a pretty classic case.

Whenever you use LIMIT (or the like such as FETCH FIRST ... ROWS ONLY) the optimizer attempts to optimize the query so that fetching only the first rows(s) is as fast as possible. That means that the optimizer has a preference for execution plans where the first cost value is low, not the second one shown in the execution plan. Remember: the two cost values shown by PostgreSQL (e.g., cost=48.150..6,416.240 are the setup cost (48.150) and the total execution cost (6,416.240).

The "problem" here is that you have an index which supports your ORDER BY clause. So, PostgreSQL thinks that it can just go through this index (in reverse order due to the DESC modifier in your query) and check for each row in the other table whether it satisfies the other WHERE clause or not. The problem is that the optimizer has no way of knowing whether that will be one of the very first rows or rather one at the end (according to the ORDER BY). The optimizer does an arbitrary guess an believes the matching row will be more towards the begin than the end. This optimistic estimate is then used to calculate the cost value which turns out to be too optimistic so that PostgreSQL finally settles down on a bad execution plan.

When you change the ORDER BY ... DESC to ORDER BY ... ASC the optimizer does the same arbitrary but optimistic estimate which turns out to be more correct in that case, hence you get better execution time.

However, from optimizations perspective, the root cause is that the optimizer estimates that 2,491 rows will match the WHERE clause tango = 650727. When the optimizer would correctly estimate that this just hits a few rows, then the problem would likely not occur.

The WHERE clause is sufficiently trivial that a good estimate should be no problem. So, the main question is: how about your statistics on that table?

There are several ways to cope with this problem:

  • Update your statistics (ANALYZE) and see if that alone helps.
  • Increase the number of most common values stored for that column (ALTER TABLE ... SET STATISTICS). This also increases the sample size used to gather the statistics which means ANALYZE takes longer but yields more accurate results.

In theory, this should be enough to fix that problem. However, other options are:

  • If you don't need the index on created_at for other reasons (like other queries), get rid of it.
  • Re-write the query so that the bad execution plan is no option any more. In particular, it would be great if you could write the query so that the ORDER BY clause uses the same table as the WHERE clause: if you are lucky, you might have a column in join_table that has the same order as table_1.created_at so that it does not make any difference on which you order. However, be careful, this is easy to get wrong (e.g., sequential numbers filled by sequences might have outliners).
Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911
Markus Winand
  • 8,371
  • 1
  • 35
  • 44
0

Although you are only adding limit 1 but any change to a query affects its execution plan, and indexes used.

To fix your issue, since you are saying that when order is ASC your query performance is good:

It seems the index created on table_1.created_at is ASC. I know in db2 you can specify when creating an index to be bi-directional ASC/DESC. I guess in postgresql you should have the same, if not you can create 2 indexes on the same field 1 with sort DESC and another wih SORT ASC

Vicky21
  • 95
  • 5