I'm trying to fetch data from a remote server with JOIN clause which involves only remote tables, but it is very slow because the planner decides to fetch all data from two tables and merge it locally. When I add WHERE clause it fixes the problem and JOIN is executed fully on a remote server.
Problem is reproducible on small example:
-- remote server
create table test
(
id serial
constraint test_pk
primary key,
name text
);
create table test2
(
test_id int
constraint test2_test_id_fk
references test (id),
info text
);
SELECT Query:
SELECT "test".id FROM "test" JOIN "test2" ON "test"."id"="test2".test_id;
Output from EXPLAIN VERBOSE (on empty tables!):
Merge Join (cost=732.29..1388.59 rows=42778 width=4)
Output: test.id
Merge Cond: (test.id = test2.test_id)
-> Sort (cost=366.15..373.46 rows=2925 width=4)
Output: test.id
Sort Key: test.id
-> Foreign Scan on public.test (cost=100.00..197.75 rows=2925 width=4)
Output: test.id
Remote SQL: SELECT id FROM public.test
-> Sort (cost=366.15..373.46 rows=2925 width=4)
Output: test2.test_id
Sort Key: test2.test_id
-> Foreign Scan on public.test2 (cost=100.00..197.75 rows=2925 width=4)
Output: test2.test_id
Remote SQL: SELECT test_id FROM public.test2
After adding WHERE test.id=1
Foreign Scan (cost=100.00..198.75 rows=225 width=4)
Output: test.id
Relations: (public.test) INNER JOIN (public.test2)
Remote SQL: SELECT r1.id FROM (public.test r1 INNER JOIN public.test2 r2 ON (((r2.test_id = 1)) AND ((r1.id = 1))))
I'm using AWS RDS Postgres v10.18 on both sides.
What is going on? How to force execution on a remote server? I didn't find anything with that problem.
Thanks for any help.