Postgres FDW join query on local and foreign table is slow with ORDER BY and JSONB data

Question

I have two tables the local table debtors and the foreign table debtor_registry. I'm using PostgreSQL v13.

My problem here is whenever I try the following query, It takes 14secs to get the 1000 records.

SELECT 
    debtors.id,
    debtors.name,
    debtor_registry.settings
FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;

I was surprised that whenever I removed the ORDER BY clause from the query, It become faster only took 194ms for 1000 records.

SELECT 
    debtors.id,
    debtors.name,
    debtor_registry.settings
FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
LIMIT 1000 OFFSET 0;

Also, another case is if I remove the settings which is a JSONB field from the query, and retain the ORDER BY clause. It only took 101ms to get 1000 records.

SELECT 
    debtors.id,
    debtors.name
FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;

I'm suspecting that It might be related to how much data I am trying to get.

Here is the EXPLAIN ANALYZE VERBOSE result if the settings JSONB fields, ORDER BY name and LIMIT 1000 are in the query:

Limit  (cost=114722.78..114725.28 rows=1000 width=57) (actual time=13712.125..14002.827 rows=1000 loops=1)
  Output: debtors.id, debtors.name, debtor_registry.settings
  ->  Sort  (cost=114722.78..114725.63 rows=1140 width=57) (actual time=13703.171..13993.617 rows=1000 loops=1)
        Output: debtors.id, debtors.name, debtor_registry.settings
        Sort Key: debtors.name
        Sort Method: external merge  Disk: 82752kB
        ->  Hash Join  (cost=896.60..114664.90 rows=1140 width=57) (actual time=14.889..917.360 rows=10550 loops=1)
              Output: debtors.id, debtors.name, debtor_registry.settings
              Hash Cond: (((debtor_registry.id)::character varying)::text = (debtors.registry_uuid)::text)
              ->  Foreign Scan on public.debtor_registry  (cost=100.00..113832.74 rows=1137 width=48) (actual time=8.845..902.466 rows=10529 loops=1)
                    Output: debtor_registry.id, debtor_registry.company_id, debtor_registry.settings, debtor_registry.product
                    Remote SQL: SELECT id, settings FROM public.company_debtor
              ->  Hash  (cost=664.60..664.60 rows=10560 width=62) (actual time=6.027..6.028 rows=10554 loops=1)
                    Output: debtors.id, debtors.name, debtors.registry_uuid
                    Buckets: 16384  Batches: 1  Memory Usage: 1108kB
                    ->  Seq Scan on public.debtors  (cost=0.00..664.60 rows=10560 width=62) (actual time=0.019..4.726 rows=10560 loops=1)
                          Output: debtors.id, debtors.name, debtors.registry_uuid
Planning Time: 0.098 ms
JIT:
  Functions: 10
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 1.609 ms, Inlining 0.000 ms, Optimization 0.674 ms, Emission 7.991 ms, Total 10.274 ms
Execution Time: 14007.113 ms

How can I make the 1st query faster without omitting the settings field and the ORDER BY name clause and LIMIT 1000?

UPDATE

I also found this similar question but the answer does not solve my problem. Since our sorting is dynamic and we build queries based on the frontend client request.
Setting use_remote_estimate to 'true' doesn't help either. :(

Strange that the sort takes that long. Does it improve when you increase `work_mem`? — Laurenz Albe, Nov 02 '21 at 15:44
The sorting seems to be triggered after the `join` and takes a lot of memory due to the data of the column `debtor_registry.settings` possibly. I'm not an expert but I would suggest you to trigger the sorting on the column `debtors.name` before the `join` with the `debtor_registry` table, by using a cte for instance. — Edouard, Nov 02 '21 at 15:44
With PostgreSQL v13, toasted values should not be de-toasted until later... — Laurenz Albe, Nov 02 '21 at 15:46
@LaurenzAlbe yeah by default it is `4MB` now I set it to `1GB` and now it works. However I'm not sure if this is possible in a managed db in DigitalOcean. — Shift 'n Tab, Nov 02 '21 at 15:46
@LaurenzAlbe "With PostgreSQL v13, toasted values should not be de-toasted until later" What does it means? — Shift 'n Tab, Nov 02 '21 at 15:47
That it shouldn't matter if there is a large column value, because it is only retrieved for the 1000 result rows. — Laurenz Albe, Nov 02 '21 at 15:48
@EdouardH. looks like it worked when you try to sort the `debtors` table first then join it to the foreign table. — Shift 'n Tab, Nov 02 '21 at 15:55
@Roel, my idea corresponds to the second suggestion of Stefanov.sm, with a second order by after the inner join so that to keep the right order if it is not persistent. — Edouard, Nov 02 '21 at 16:09
@EdouardH. I guess you are right, Stefanov.sm 2nd suggestion saved my ass here :D — Shift 'n Tab, Nov 02 '21 at 16:18
@LaurenzAlbe I suspect the large column was already detoasted by the FDW before it ever got to the sort. — jjanes, Nov 02 '21 at 16:46

Stefanov.sm · Accepted Answer · 2021-11-02T16:03:42.573

1

Try

with t as materialized
(
 SELECT -- your second query as-is
    debtors.id,
    debtors.name,
    debtor_registry.settings
 FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
 LIMIT 1000 OFFSET 0
)
select * from t ORDER BY name;

i.e. keep the plan of the quick second query and order the resultset after that. If your Postgresql version is before 12 then omit materialized as CTEs are always materialized.

Second suggestion - sort/limit locally, pick the right records upfront and then pull fat debtor_registry.settings for only 1000 records.

with t as materialized
(
 SELECT d.id, d.name, d.registry_uuid 
 FROM debtors d
 ORDER BY d.name
 LIMIT 1000 OFFSET 0
)
select t.id, t.name, debtor_registry.settings
FROM t INNER JOIN debtor_registry ON debtor_registry.id = t.registry_uuid
ORDER BY t.name;

edited Nov 02 '21 at 16:03

answered Nov 02 '21 at 15:29

Stefanov.sm

11,215
2
21
21

Weird it works! Took only 1 sec. I hope I could improve it to less than a second :D – Shift 'n Tab Nov 02 '21 at 15:36
That's a different query; it does not select the first 1000 rows in the desired ordering. – Laurenz Albe Nov 02 '21 at 15:42
@LaurenzAlbe ahhh yeah you are right, It only pull the first 1000 records and then order them. – Shift 'n Tab Nov 02 '21 at 15:43
@LaurenzAlbe Fair enough, it is not. – Stefanov.sm Nov 02 '21 at 15:44
@Stefanov.sm still thanks for answering man – Shift 'n Tab Nov 02 '21 at 15:44
I will try the new suggestion. I didn't notice the update. – Shift 'n Tab Nov 02 '21 at 16:11
Now it gave me the right result this time. Thank you man! – Shift 'n Tab Nov 02 '21 at 16:16

Postgres FDW join query on local and foreign table is slow with ORDER BY and JSONB data

1 Answers1