0

I'm using postgres_fdw to connect to another PostgreSQL server. The foreign data wrapper is currently configured with a large fetch_size and extremely large fdw_startup_cost. This working well for most of my queries. I did this partly based on https://awide.io/strange-cost-estimation-for-foreign-tables/.

CREATE SERVER the_foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (
  host 'foreign_server_here',
  port '5432',
  dbname 'postgres',
  use_remote_estimate 'false',
  updatable 'false',
  truncatable 'false',
  fetch_size '100000',
  fdw_startup_cost '100000000000000000',
  fdw_tuple_cost '0.10'
);

Local server is on PostgreSQL 14.3. Foreign server is on PostgreSQL 12.8.

However, one query I'm stymied on involves a left join on two tables that takes a long time to execute when I have a small limit, but is practically instantaneous with a large limit. Here's the query, with the column names left out for brevity.

SELECT id.…, sc.…
FROM invoice_detail id
LEFT JOIN supplier_catalog sc ON (id."VENDOR_SERVICE_ID" = sc."VENDOR_SERVICE_ID")
LIMIT 10000;

With LIMIT 10000, the query runs very quickly, executing in 60 ms as shown in this EXPLAIN ANALYZE VERBOSE output.

Foreign Scan  (cost=100000000000000000.00..100000000220463968.00 rows=10000 width=167) (actual time=59.088..59.691 rows=10000 loops=1)
  Output: …
  Relations: (docpadmin.invoice_detail id) LEFT JOIN (docpadmin.supplier_catalog sc)
  Remote SQL: SELECT … FROM (docpadmin.invoice_detail r1 LEFT JOIN docpadmin.supplier_catalog r2 ON (((r1."VENDOR_SERVICE_ID" = r2."VENDOR_SERVICE_ID")))) LIMIT 10000::bigint
Query Identifier: -9026301604523228886
Planning Time: 0.201 ms
Execution Time: 62.951 ms

But, if I reduce the selection to LIMIT 100, the query takes nearly 5 seconds to run.

Limit  (cost=100000000000000000.00..100000000002320672.00 rows=100 width=167) (actual time=4822.803..4822.826 rows=100 loops=1)
  Output: …
  ->  Foreign Scan  (cost=100000000000000000.00..100007212950048160.00 rows=310813344 width=167) (actual time=4822.801..4822.815 rows=100 loops=1)
        Output: …
        Relations: (docpadmin.invoice_detail id) LEFT JOIN (docpadmin.supplier_catalog sc)
        Remote SQL: SELECT … FROM (docpadmin.invoice_detail r1 LEFT JOIN docpadmin.supplier_catalog r2 ON (((r1."VENDOR_SERVICE_ID" = r2."VENDOR_SERVICE_ID"))))
Query Identifier: -9026301604523228886
Planning Time: 0.191 ms
Execution Time: 5006.872 ms

If I'm understanding this correctly, PostgreSQL didn't push the LIMIT to the foreign server and is instead selecting 100 rows locally. How do I discourage it from doing this, and let the foreign server do the work like the LIMIT 10000 case?

King Chung Huang
  • 5,026
  • 28
  • 24
  • Add Postgres version information, on both ends, to question. – Adrian Klaver Jul 18 '22 at 21:30
  • Ah, good point. Added! – King Chung Huang Jul 18 '22 at 21:40
  • 1) Why `use_remote_estimate 'false',`? Have you tried with `use_remote_estimate 'true',`? 2) What does `EXPLAIN ANALYZE VERBOSE` show for the actual query produced in each case? – Adrian Klaver Jul 18 '22 at 21:41
  • I've replaced the plans above with the output from `EXPLAIN ANALYZE VERBOSE`. As I suspected, it shows that in the `LIMIT 10000` case, the limit is being sent to the foreign server. But, in the `LIMIT 100` case, it's getting everything and then picking off 100 rows. – King Chung Huang Jul 18 '22 at 22:09
  • I've found setting `use_remote_estimate 'true'` produces worse results for virtually all my queries. For this scenario, both the `LIMIT 100` and `LIMIT 10000` cases end up taking 5 seconds because the LIMIT isn't sent to the foreign server. – King Chung Huang Jul 18 '22 at 22:12
  • Have you changed the foreign server settings? The article you referenced was dealing with older versions of Postgres. There has been quite a bit of improvements in the newer versions. I suspect using the defaults might help. – Adrian Klaver Jul 18 '22 at 22:12
  • I started with the defaults, and tried lots of variations to get to the settings as shown. That article isn't the only one I used. It's just similar to the settings I currently have. – King Chung Huang Jul 18 '22 at 22:15
  • Hmm, from here [Limit](https://www.postgresql.org/docs/current/sql-select.html#SQL-LIMIT):"...Thus, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. ...". So you might try adding `ORDER BY id."VENDOR_SERVICE_ID"` – Adrian Klaver Jul 18 '22 at 22:30
  • Did you ANALYE the foreign tables? – jjanes Jul 18 '22 at 22:57
  • Yes, all the foreign tables were analyzed on the local server ahead of time. – King Chung Huang Jul 19 '22 at 04:30

1 Answers1

1

I've ended up increasing fdw_tuple_cost to 100000000000000000 (an arbitrarily large number), making the query plans come out extremely costly to transfer data between servers.

fwd_tuple_cost

This option, which can be specified for a foreign server, is a floating point value that is used as extra cost per-tuple for foreign-table scans on that server. This represents the additional overhead of data transfer between servers. You might increase or decrease this number to reflect higher or lower network delay to the remote server. The default value is 0.01.

With that in place, the local server is no longer pulling extra data over and selecting from it. It's sending all the limits to the foreign server, which is what I would prefer.

King Chung Huang
  • 5,026
  • 28
  • 24