0

Recently upgraded Aurora Postgres from 13.4 to 14.8. We observe there is a change in plan for many SQL's and causing major performance issue. Is there any change in optimizer in v14.8.?

SQL Query

select
    row_number() over (
    order by MIN(mpfdh.MODULE_SEQNO) asc,
    MIN(mpfdh.OPERATION_SEQNO) asc ) as sequence,
    mpfdh.pd_id as PdId ,
    case
        when SUM(mpfdh.MANDATORY_FLAG) > 0 then true
        else false
    end as SomePDsMandatory ,
    case
        when SUM(mpfdh.MANDATORY_FLAG) = COUNT(mpfdh.pd_id) then true
        else false
    end as AllPDsMandatory ,
    case
        when COUNT(*) = COUNT(mpfdh.pd_id) then true
        else false
    end as UsedByAllMainPDs ,
    BOOL_OR( case when mpfdh.DEPARTMENT = 'TST' and mpfdh.MANDATORY_FLAG = 1 then true else false end) as IsTestPD,
    STRING_AGG(p.MAINPD_ID, '; ' order by p.MAINPD_ID ) as MainPDs
from
    pdwh_ll.LOT_STATE ls
join pdwh_ll.PRODUCT p on
    p.PRODSPEC_ID = ls.PRODSPEC_ID
join pdwh_ll.MAINPD_FLOW_HIST mpfh on
    ( mpfh.mainpd_id = p.mainpd_id
    and mpfh.valid_to > current_date )
join pdwh_ll.MAINPD_FLOW_DET_HIST mpfdh
        using (mainpd_flow_hist_sk)
where
    p.DELETED_FLAG = 'N' /* AND p.STATE != 'Obsolete' */
        and ls.orig_site_name =  'FXXX'
    and p.orig_site_name = 'FXXX'
    and mpfh.orig_site_name = 'FXXX'
    and mpfdh.orig_site_name = 'FXXX'
    and ls.LOT_ID in ('xxxxx.00')
group by
    mpfdh.pd_id
order by
    MIN(mpfdh.MODULE_SEQNO) asc;

V13.4 Plan

WindowAgg  (cost=1882.18..1947.37 rows=2173 width=125)
   ->  Sort  (cost=1882.18..1887.61 rows=2173 width=162)
         Sort Key: (min(mpfdh.module_seqno)), (min(mpfdh.operation_seqno))
         ->  GroupAggregate  (cost=1653.09..1761.74 rows=2173 width=162)
               Group Key: mpfdh.pd_id
               ->  Sort  (cost=1653.09..1658.52 rows=2173 width=53)
                     Sort Key: mpfdh.pd_id
                     ->  Nested Loop  (cost=1.98..1532.64 rows=2173 width=53)
                           ->  Nested Loop  (cost=1.41..23.19 rows=1 width=29)
                                 ->  Nested Loop  (cost=0.98..17.03 rows=1 width=21)
                                       ->  Index Scan using part_lot_state_1_orig_site_name_lot_id_key on part_lot_state_1 ls  (cost=0.57..8.59 rows=1
                                             Index Cond: ((orig_site_name = 'FXXX'::bpchar) AND ((lot_id)::text = 'xxxxx.00'::text))
                                       ->  Index Scan using part_product_1_orig_site_name_prodspec_id_key on part_product_1 p  (cost=0.41..8.44 rows=1
                                             Index Cond: ((orig_site_name = 'FXXX'::bpchar) AND ((prodspec_id)::text = (ls.prodspec_id)::text))
                                             Filter: ((deleted_flag)::text = 'N'::text)
                                 ->  Index Scan using part_mainpd_flow_hist_1_orig_site_name_mainpd_id_valid_from_key on part_mainpd_flow_hist_1 mpfh
                                       Index Cond: ((orig_site_name = 'FXXX'::bpchar) AND ((mainpd_id)::text = (p.mainpd_id)::text))
                                       Filter: (valid_to > CURRENT_DATE)
                           ->  Index Scan using part_mainpd_flow_det_hist_1_orig_site_name_mainpd_flow_hist_key on part_mainpd_flow_det_hist_1 mpfdh  (
                                 Index Cond: ((orig_site_name = 'FXXX'::bpchar) AND (mainpd_flow_hist_sk = mpfh.mainpd_flow_hist_sk))

v14.8 plan

 WindowAgg  (cost=850385061.80..850385067.80 rows=200 width=125)
   ->  Sort  (cost=850385061.80..850385062.30 rows=200 width=162)
         Sort Key: (min(mpfdh.module_seqno)), (min(mpfdh.operation_seqno))
         ->  GroupAggregate  (cost=1.99..850385054.16 rows=200 width=162)
               Group Key: mpfdh.pd_id
               ->  Nested Loop  (cost=1.99..594399180.76 rows=10239434816 width=53)
                     Join Filter: ((mpfh.mainpd_id)::text = (p.mainpd_id)::text)
                     ->  Nested Loop  (cost=1.57..136541848.72 rows=16649356915 width=76)
                           ->  Nested Loop  (cost=1.14..49002313.72 rows=188971760 width=65)
                                 ->  Index Scan using part_mainpd_flow_det_hist_1_pd_id_idx on part_mainpd_flow_det_hist_1 mpfdh  (cost=0.57..46640158.13 rows=188971760 width=40)
                                       Filter: (orig_site_name = 'FXXX'::bpchar)
                                 ->  Materialize  (cost=0.57..8.59 rows=1 width=25)
                                       ->  Index Scan using part_lot_state_1_orig_site_name_lot_id_key on part_lot_state_1 ls  (cost=0.57..8.59 rows=1 width=25)
                                             Index Cond: ((orig_site_name = 'FXXX'::bpchar) AND ((lot_id)::text = 'xxxxx.00'::text))
                           ->  Index Scan using part_mainpd_flow_hist_1_pkey on part_mainpd_flow_hist_1 mpfh  (cost=0.43..0.45 rows=1 width=27)
                                 Index Cond: ((orig_site_name = 'FXXX'::bpchar) AND (mainpd_flow_hist_sk = mpfdh.mainpd_flow_hist_sk))
                                 Filter: (valid_to > CURRENT_DATE)
                     ->  Memoize  (cost=0.42..8.45 rows=1 width=47)
                           Cache Key: ls.prodspec_id
                           Cache Mode: logical
                           ->  Index Scan using part_product_1_orig_site_name_prodspec_id_key on part_product_1 p  (cost=0.41..8.44 rows=1 width=47)
                                 Index Cond: ((orig_site_name = 'FXXX'::bpchar) AND ((prodspec_id)::text = (ls.prodspec_id)::text))
                                 Filter: ((deleted_flag)::text = 'N'::text)

Recently upgraded Aurora Postgres from 13.4 to 14.8. We observe there is a change in plan for many SQL's and causing major performance issue. Is there any change in optimizer in v14.8.?

MOHAN
  • 1
  • 2
    rows=188971760, when did you run analyze? Or auto analyze. And could you please share the results from EXPLAIN(ANALYZE, VERBOSE, BUFFERS, SETTINGS) for this statement? (in plain text, as an update to your question). The DDL for the tables and indexes involved, would be helpful as well. – Frank Heikens Jul 29 '23 at 13:57
  • Team, Above issue has been fixed by collecting additional samplings. – MOHAN Aug 07 '23 at 04:39
  • Changed default_statristics_target from 100 to 250. After that it started to taking a same plan like older version (13.4) – MOHAN Aug 07 '23 at 04:41

0 Answers0