2

I am running the following query on a remote Postgres instance, from a local client:

select * from matches_tb1 order by match_id desc limit 10;

matches_tb1 is a foreign table and has match_id as unique index. The query seems to hang forever. When I use explain verbose, there is no ORDER BY attached to "Remote SQL". I guess local server did not push down order by to remote server. How can I resolve this?

Attached is explain results:

 explain verbose select match_id from matches_tb1 order by match_id desc limit 10;
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Limit  (cost=33972852.96..33972852.98 rows=10 width=8)
   Output: match_id
   ->  Sort  (cost=33972852.96..35261659.79 rows=515522734 width=8)
         Output: match_id
         Sort Key: matches_tb1.match_id DESC
         ->  Foreign Scan on public.matches_tb1  (cost=100.00..22832592.02 rows=515522734 width=8)
               Output: match_id
               Remote SQL: SELECT match_id FROM public.matches_tb1
(8 rows)
  • Please post the entire `EXPLAIN` plan for your query in your question. – Tim Biegeleisen Oct 12 '18 at 03:44
  • attached result of explain verbose – Friends_little_black Oct 12 '18 at 03:52
  • The plan mentions that `match_id` is being used as a sort key, so I don't see anything unexpected there. Are there any other queries you could try, so that we may see if this latency problem exists in other scenarios? – Tim Biegeleisen Oct 12 '18 at 03:55
  • sql_1 = select match_id from matches_tb1 where match_id > 4164287140 order by match_id desc limit 10; sql_2 = select match_id from matches_tb1 where match_id > 416428 order by match_id desc limit 10; I can get result of sql_1 very fast, but sql_2 hangs forever. If these sqls performed directly on remote server (not through postgres_FDW) , both can end in a second. – Friends_little_black Oct 12 '18 at 04:14
  • I take back what I said above. It appears that the `match_id` index is _not_ being used. As a result, Postgres is manually sorting the entire 500 million record table, just to find the top 10 records. So, we need to find some way for the index to kick in. Are you sure that `match_id` has a B-tree index on it? – Tim Biegeleisen Oct 12 '18 at 04:18
  • Thanks for ur patience. Yes I am sure (I looked into it). Because if match_id don't have B-Tree index, sql_2 will not finish in a second when the table has 500 million rows. – Friends_little_black Oct 12 '18 at 04:27
  • It finally "clicked" in my head what is happening. I posted an answer below, and we can go from there regarding what your options here might be. – Tim Biegeleisen Oct 12 '18 at 04:35
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181719/discussion-between-friends-little-black-and-tim-biegeleisen). – Friends_little_black Oct 12 '18 at 05:13

1 Answers1

0

For the first query in your question:

select * from matches_tb1 order by match_id desc limit 10;

It appears based on the EXPLAIN plan that Postgres is not using the match_id B-tree index. This is resulting in a very long query, because the database has to scan the entire 500 million record table and sort, to find the 10 records. As to why Postgres cannot use the index, the problem is select *. When the database reaches the leaf node of every entry in the index, it only finds a value for match_id. However, since you are doing select *, the database would have to do a lookup into the clustered index to find the values for all the other columns. If your table has low correlation, then the optimizer would likely choose to abandon the index altogether and just do a full scan of the table.

In contrast, consider one of your other queries which is executing quickly:

select match_id from matches_tb1 where match_id > 4164287140
order by match_id desc limit 10

In this case, the index on match_id can be used, because you are only selecting match_id. In addition, the restriction in the where clause helps even more to make the index more specific.

So the resolution to your problem here is to not do select * with limit, if you want the query to finish quickly. For example, if you only wanted say two columns col1 and col2 from your table, then you may add those columns to the index to cover them. Then, the following query should also be fast:

select match_id, col1, col2 from matches_tb1 order by match_id desc limit 10;
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • First of all thanks for ur response. The foreign table has 500 million rows. when I directly connect to remote server I can get result in seconds. But when I use FDW I can't get results(hangs forever). So I assume order by did not pushed down to remote server. – Friends_little_black Oct 12 '18 at 03:21
  • Actually, that you are seeing a delay would seem to indicate that there _is_ an ordering step happening for the `LIMIT` clause. – Tim Biegeleisen Oct 12 '18 at 03:23
  • Sorry I didn't get ur point. I assume my query hangs forever because local server try to fetch all data(500 million) from foreign table and do sort locally. If order by is done on remote server then I should also see result in a second. – Friends_little_black Oct 12 '18 at 03:34
  • If you're running the query on the remote, then the whole thing should run on the remote. If you're suggesting that Postgres is returning 500 million records across the network, and then you're doing an `ORDER BY` locally, I would not at all expect this. – Tim Biegeleisen Oct 12 '18 at 03:35
  • Point is I don't want to do ORDER BY locally, I want ORDER BY to be performed on remote server but from what I saw ORDRE BY is not done on remote server. If it does why my query hangs forever? – Friends_little_black Oct 12 '18 at 03:43
  • Sorry , I think there is a little misunderstanding because u might not read my comment above carefully. ""select match_id from matches_tb1 where match_id > 416428 order by match_id desc limit 10;"" This query is not finish quickly, it still hangs forever. – Friends_little_black Oct 12 '18 at 05:24
  • @Friends_little_black My answer doesn't really change. Run `EXPLAIN` on that second query, I'll bet the index also isn't being used. The answer to your question is, the index is not being used, and you have a table with 500 million records, so a scan will hang forever. – Tim Biegeleisen Oct 12 '18 at 05:30
  • Yes, but do u know how to solve this? if I directly connect to remote server the match_id index will be used, and query will end in a sec. – Friends_little_black Oct 12 '18 at 06:23