PostgreSQL performance issues with aggregation

Question

I have a query that runs very slowly (about 40 seconds) it looks like this:

        SELECT
            h3data.h3index,
            sum(h3data.value)
        FROM locations l
        INNER JOIN records r ON r."locationsId" = l.id 
        INNER JOIN values v ON v."recordId" = r.id,
        LATERAL (
            SELECT
                h3index,
                sum(v.value/nullif(v.scaler, 0)) * value as value
            FROM some_stored_procedure_that_returns_table(l."geomId", v."h3Id")
        ) h3data
        WHERE r.year=2015
        AND l."someCondition" IS NULL
        AND l."otherCondition" IS NULL
        AND v."referenceId" = '633cf928-7c4f-41a3-99c5-e8c1bda0b323'
        GROUP by h3data.h3index

EXPLAIN ANALYZE shows that:

      HashAggregate  (cost=84339.65..84341.65 rows=200 width=16) (actual time=228718.261..240515.611 rows=1038113 loops=1)
  Group Key: some_stored_procedure_that_returns_table.h3index
  Batches: 69  Memory Usage: 4265kB  Disk Usage: 329856kB
  ->  Nested Loop  (cost=850.26..53564.65 rows=2462000 width=32) (actual time=84.559..170740.988 rows=6596332 loops=1)
        ->  Hash Join  (cost=850.01..4324.40 rows=2462 width=48) (actual time=81.209..652.170 rows=2516 loops=1)
              Hash Cond: (r."locationId" = l.id)
              ->  Hash Join  (cost=692.40..4160.31 rows=2462 width=48) (actual time=40.561..570.577 rows=2516 loops=1)
                    Hash Cond: (v."recordId" = r.id)
                    ->  Seq Scan on values v  (cost=0.00..3396.80 rows=27086 width=48) (actual time=0.018..293.862 rows=27676 loops=1)
                          Filter: ("referenceId" = '633cf928-7c4f-41a3-99c5-e8c1bda0b323'::uuid)
                          Rows Removed by Filter: 83028
                    ->  Hash  (cost=660.95..660.95 rows=2516 width=32) (actual time=40.445..40.465 rows=2516 loops=1)
                          Buckets: 4096  Batches: 1  Memory Usage: 190kB
                          ->  Seq Scan on records r  (cost=0.00..660.95 rows=2516 width=32) (actual time=0.017..22.051 rows=2516 loops=1)
                                Filter: (year = 2015)
                                Rows Removed by Filter: 25160
              ->  Hash  (cost=126.16..126.16 rows=2516 width=32) (actual time=40.621..40.641 rows=2516 loops=1)
                    Buckets: 4096  Batches: 1  Memory Usage: 190kB
                    ->  Seq Scan on locations l  (cost=0.00..126.16 rows=2516 width=32) (actual time=0.025..20.739 rows=2516 loops=1)
                          Filter: (("someCondition" IS NULL) AND ("otherCondition" IS NULL))
        ->  Function Scan on some_stored_procedure_that_returns_table  (cost=0.25..10.25 rows=1000 width=16) (actual time=9.733..29.330 rows=2622 loops=2516)
Planning Time: 2.166 ms
Execution Time: 248250.567 ms

So this is taking a lot of time, howeve, if I just remove the aggregation and the grouping by, like:

        SELECT
            h3data.h3index,
            h3data.value
        FROM locations l
        INNER JOIN records r ON r."locationsId" = l.id 
        INNER JOIN values v ON v."recordId" = r.id,
        LATERAL (
            SELECT
                h3index,
                sum(v.value/nullif(v.scaler, 0)) * value as value
            FROM some_stored_procedure_that_returns_table(l."geomId", v."h3Id")
        ) h3data
        WHERE r.year=2015
        AND l."someCondition" IS NULL
        AND v."referenceId" = '633cf928-7c4f-41a3-99c5-e8c1bda0b323'

This is the result:

 ->  Nested Loop  (cost=850.26..53564.65 rows=2462000 width=32) (actual time=84.559..170740.988 rows=6596332 loops=1)
        ->  Hash Join  (cost=850.01..4324.40 rows=2462 width=48) (actual time=81.209..652.170 rows=2516 loops=1)
              Hash Cond: (r."locationId" = l.id)
              ->  Hash Join  (cost=692.40..4160.31 rows=2462 width=48) (actual time=40.561..570.577 rows=2516 loops=1)
                    Hash Cond: (v."recordId" = r.id)
                    ->  Seq Scan on values v  (cost=0.00..3396.80 rows=27086 width=48) (actual time=0.018..293.862 rows=27676 loops=1)
                          Filter: ("referenceId" = '633cf928-7c4f-41a3-99c5-e8c1bda0b323'::uuid)
                          Rows Removed by Filter: 83028
                    ->  Hash  (cost=660.95..660.95 rows=2516 width=32) (actual time=40.445..40.465 rows=2516 loops=1)
                          Buckets: 4096  Batches: 1  Memory Usage: 190kB
                          ->  Seq Scan on records r  (cost=0.00..660.95 rows=2516 width=32) (actual time=0.017..22.051 rows=2516 loops=1)
                                Filter: (year = 2015)
                                Rows Removed by Filter: 25160
              ->  Hash  (cost=126.16..126.16 rows=2516 width=32) (actual time=40.621..40.641 rows=2516 loops=1)
                    Buckets: 4096  Batches: 1  Memory Usage: 190kB
                    ->  Seq Scan on locations l  (cost=0.00..126.16 rows=2516 width=32) (actual time=0.025..20.739 rows=2516 loops=1)
                          Filter: (("someCondition" IS NULL) AND ("otherCondition" IS NULL))
        ->  Function Scan on some_stored_procedure_that_returns_table  (cost=0.25..10.25 rows=1000 width=16) (actual time=9.733..29.330 rows=2622 loops=2516)
 Planning Time: 4.976 ms
 Execution Time: 220007.237 ms

It decreases by a lot and it executes fast enough. This is the kind of data I am trying to aggregate:

> h3Index           values
> 862d84c27ffffff   6706189360729522000000000000000000000000000
> 862db112fffffff   24690280185829940000000000000000000000000000
> 862da2757ffffff   6363074936795764000000000000000000000000000
> 862db1c77ffffff   20955525424756833000000000000000000000000000
> 862db1ad7ffffff   2384501631174928000000000000000000000000000
> 862d84c1fffffff   7026257930089419000000000000000000000000000
> 862da249fffffff   1166966013803679400000000000000000000000000
> 862da274fffffff   9853446181273213000000000000000000000000000
> 862db1c6fffffff   15668891331171954000000000000000000000000000

These h3Index that can come from different tables are always indexed, and the amount of rows that I want to sum up and the group by h3Index is a bit more than 26 million

Can this amount make the performance decrease so much just for a aggregaton? I know that this is an expensive operation computational wise, but can be this significant? From 1 second to 40 approx.

I think that the main issue is there and not in the inners of some stored procedures that are in action within this query, and I think I'm hitting some basics here but can't figure it out

Any suggestions on what I can do or where should I look at?

Thanks in advance

PS: Running postgis/postgis:13-3.1 via Docker / Kubernetes

You write "runs very slowly (about 40 seconds)" but the plan says "actual time=236588.655..248466.136". That's a large difference. Which is correct? — Julius Tuskenis, May 05 '22 at 05:49
Adding explain analyze increases the total execution time in my case, don't know if this is right. Whit no explain analyze, 40 seconds is the approx time this query needs to return a response — SeekanDestroy, May 05 '22 at 05:52
".....MORE" ? You mean LESS ? Because you left a portion of the query plan. work_mem is too small, the database has to sort the data on disk and that makes it slow. Please show us the entire plan and not half of it — Frank Heikens, May 05 '22 at 09:28
`INNER JOIN records r ON r."locationsId" = l.id INNER JOIN values v ON v."recordId" = sr.id` <<-- these appear to be FK->PK references. Missing keys/indexes? — wildplasser, May 05 '22 at 09:35
@FrankHeikens thanks for answering! Yeh you right, I thought it was better to show just the difference and the "obvious" bottleneck here, as is pretty much the same for the rest of the plan (depending on my resources at execution time). I updated it including the whole plan. I'm gonna try increasing the mem size as well. Thank you! — SeekanDestroy, May 05 '22 at 12:19
@wildplasser yes that was a typo, just corrected it. Thanks! — SeekanDestroy, May 05 '22 at 12:19
Must be me, but I don't see a lot of difference between Time: 248250.567 ms and Time: 220007.237 ms. Yes there is a difference of 28 seconds, but that's just 10%. To improve performance, I would check some_stored_procedure_that_returns_table() to see what that one is doing and how to improve it. — Frank Heikens, May 05 '22 at 12:42
@FrankHeikens That what happens if I explain analyze the query. As I stated in the original question, the query without the aggregation runs in less than 1 second, and then if I add a sum() and a group by, it takes more than 40s. So the stored procedure itself is running fast enough — SeekanDestroy, May 05 '22 at 12:44
That is really weird and something I have never seen. And I use EXPLAIN almost every day for at least 10 years. — Frank Heikens, May 05 '22 at 12:48

PostgreSQL performance issues with aggregation

0 Answers0