I'm using postgres_fdw
to connect to another PostgreSQL server. The foreign data wrapper is currently configured with a large fetch_size
and extremely large fdw_startup_cost
. This working well for most of my queries. I did this partly based on https://awide.io/strange-cost-estimation-for-foreign-tables/.
CREATE SERVER the_foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (
host 'foreign_server_here',
port '5432',
dbname 'postgres',
use_remote_estimate 'false',
updatable 'false',
truncatable 'false',
fetch_size '100000',
fdw_startup_cost '100000000000000000',
fdw_tuple_cost '0.10'
);
Local server is on PostgreSQL 14.3. Foreign server is on PostgreSQL 12.8.
However, one query I'm stymied on involves a left join on two tables that takes a long time to execute when I have a small limit, but is practically instantaneous with a large limit. Here's the query, with the column names left out for brevity.
SELECT id.…, sc.…
FROM invoice_detail id
LEFT JOIN supplier_catalog sc ON (id."VENDOR_SERVICE_ID" = sc."VENDOR_SERVICE_ID")
LIMIT 10000;
With LIMIT 10000
, the query runs very quickly, executing in 60 ms as shown in this EXPLAIN ANALYZE VERBOSE
output.
Foreign Scan (cost=100000000000000000.00..100000000220463968.00 rows=10000 width=167) (actual time=59.088..59.691 rows=10000 loops=1)
Output: …
Relations: (docpadmin.invoice_detail id) LEFT JOIN (docpadmin.supplier_catalog sc)
Remote SQL: SELECT … FROM (docpadmin.invoice_detail r1 LEFT JOIN docpadmin.supplier_catalog r2 ON (((r1."VENDOR_SERVICE_ID" = r2."VENDOR_SERVICE_ID")))) LIMIT 10000::bigint
Query Identifier: -9026301604523228886
Planning Time: 0.201 ms
Execution Time: 62.951 ms
But, if I reduce the selection to LIMIT 100
, the query takes nearly 5 seconds to run.
Limit (cost=100000000000000000.00..100000000002320672.00 rows=100 width=167) (actual time=4822.803..4822.826 rows=100 loops=1)
Output: …
-> Foreign Scan (cost=100000000000000000.00..100007212950048160.00 rows=310813344 width=167) (actual time=4822.801..4822.815 rows=100 loops=1)
Output: …
Relations: (docpadmin.invoice_detail id) LEFT JOIN (docpadmin.supplier_catalog sc)
Remote SQL: SELECT … FROM (docpadmin.invoice_detail r1 LEFT JOIN docpadmin.supplier_catalog r2 ON (((r1."VENDOR_SERVICE_ID" = r2."VENDOR_SERVICE_ID"))))
Query Identifier: -9026301604523228886
Planning Time: 0.191 ms
Execution Time: 5006.872 ms
If I'm understanding this correctly, PostgreSQL didn't push the LIMIT to the foreign server and is instead selecting 100 rows locally. How do I discourage it from doing this, and let the foreign server do the work like the LIMIT 10000
case?