We have a complex query which is dynamically built depending on different options on the customer and runs a query for data. We have functions in Azure which are running these queries to build reporting data every night, we run approx. 30k of these. The queries in isolation are about as fast as I can get them, approximately 100ms but when we are running functions in parallel on the consumption plan in Azure (restricted to a maximum of 5 functions running at the same time) the performance of the queries is dropping off and some are even timing out at 5 minutes, some which are timing out I have tested in isolation and are coming in at under 100ms. There are no writes as this is using a read replica in Azure to load this data.
We are running in Postgres 11.6 on hosted Azure with PgBouncer on a VM. All these queries are going to a read replica which is configured to a 4 vCore Memory Optimized.
What changes can we make to allow more parallel execution of these queries or is scaling up our only option?
I would like to share the EXPLAIN ANALYZE but this is restricted by the business. Please let me know what information would help and I will try to provide as much as possible.
CTE Scan on bravo_zulu romeo (cost=2151.89..2151.94 rows=1 width=204) (actual time=27.756..84.147 rows=36 loops=1)
CTE bravo_zulu
-> Nested Loop (cost=13.84..2151.89 rows=1 width=139) (actual time=27.744..84.009 rows=36 loops=1)
-> Nested Loop (cost=13.42..2151.43 rows=1 width=139) (actual time=26.811..76.983 rows=36 loops=1)
-> Nested Loop (cost=12.86..130.51 rows=1 width=44) (actual time=7.471..19.361 rows=29 loops=1)
-> Nested Loop (cost=4.88..97.73 rows=1 width=24) (actual time=7.410..10.480 rows=24 loops=1)
-> Index Scan using yankee on xray_zulu foxtrot_uniform (cost=0.28..8.29 rows=1 width=8) (actual time=1.339..1.340 rows=1 loops=1)
Index Cond: ("juliet" = 20)
-> Bitmap Heap Scan on golf_delta hotel_six (cost=4.60..89.43 rows=1 width=20) (actual time=6.064..9.123 rows=24 loops=1)
Recheck Cond: ("delta_oscar_hotel" = foxtrot_uniform."lima")
Filter: ("juliet" = ANY ('foxtrot_oscar'::integer[]))
Rows Removed by Filter: 442
Heap Blocks: exact=65
-> Bitmap Index Scan on papa (cost=0.00..4.60 rows=42 width=0) (actual time=0.024..0.024 rows=466 loops=1)
Index Cond: ("delta_oscar_hotel" = foxtrot_uniform."lima")
-> Bitmap Heap Scan on delta_sierra_two bravo_hotel (cost=7.98..32.76 rows=2 width=20) (actual time=0.321..0.363 rows=1 loops=24)
Recheck Cond: ((hotel_six."juliet" = "xray_india") OR (hotel_six."juliet" = "foxtrot_foxtrot"))
Filter: ("hotel_golf" = 23)
Rows Removed by Filter: 10
Heap Blocks: exact=240
-> BitmapOr (cost=7.98..7.98 rows=9 width=0) (actual time=0.066..0.066 rows=0 loops=24)
-> Bitmap Index Scan on delta_sierra_sierra (cost=0.00..3.99 rows=5 width=0) (actual time=0.063..0.063 rows=11 loops=24)
Index Cond: (hotel_six."juliet" = "xray_india")
-> Bitmap Index Scan on xray_sierra (cost=0.00..3.99 rows=4 width=0) (actual time=0.002..0.002 rows=0 loops=24)
Index Cond: (hotel_six."juliet" = "foxtrot_foxtrot")
-> Index Only Scan using echo on xray_papa victor (cost=0.56..2020.44 rows=48 width=102) (actual time=1.606..1.986 rows=1 loops=29)
Index Cond: (("five_lima" = 23) AND ("seven_yankee" = bravo_hotel."november") AND ("charlie_hotel" five_romeo NULL))
Filter: (("three" = 'charlie_romeo'::text) AND (("alpha" = 'golf_bravo'::text) OR ("alpha" = 'delta_echo'::text)) AND ((("alpha" = ANY ('mike_juliet'::text[])) AND ("mike_lima" >= 'xray_whiskey'::date) AND ("mike_lima" <= 'uniform'::date)) OR (("alpha" = ANY ('kilo'::text[])) AND ("quebec_uniform" >= 'xray_whiskey'::date) AND ("quebec_uniform" <= 'uniform'::date)) OR (("alpha" = 'quebec_alpha_quebec'::text) AND ("quebec_uniform" >= 'xray_whiskey'::date) AND ("quebec_uniform" <= 'uniform'::date) AND ("mike_lima" >= 'xray_whiskey'::date) AND ("mike_lima" <= 'uniform'::date))) AND ((("alpha" = ANY ('oscar'::text[])) AND ("seven_india" = ANY ('four'::text[]))) OR (("alpha" = ANY ('quebec_alpha_delta'::text[])) AND ("seven_charlie" = ANY ('four'::text[])))))
Rows Removed by Filter: 1059
Heap Fetches: 0
-> Index Scan using bravo_papa on tango sierra (cost=0.42..0.45 rows=1 width=16) (actual time=0.194..0.194 rows=1 loops=36)
Index Cond: (("bravo_two" = 23) AND ("delta_tango" = six1."delta_oscar_romeo"))
SubPlan
-> Result (cost=0.01..0.02 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=36)
One-Time Filter: ((romeo.zulu = 'golf_bravo'::text) AND (romeo.golf_uniform = 20) AND (romeo.charlie_two = 'charlie_romeo'::text))
SubPlan
-> Result (cost=0.01..0.02 rows=1 width=32) (actual time=0.000..0.000 rows=0 loops=36)
One-Time Filter: ((romeo.zulu = 'delta_echo'::text) AND (romeo.charlie_two = 'charlie_romeo'::text) AND (romeo.golf_uniform = 20))
Planning time: 19.385 ms
Execution time: 84.373 ms
Above is an anonymised execution plan, this same query when running the functions in Azure in parallel timed out.
Table sizes are not large, largest is 8m rows but all others are low 100k.