Hi Postgresql 12 gradually spiked to 100% CPU. App team was looking after issue from 60 minutes. 2 different queries were taking high cpu. Tables involved were small so i just ran analyze on all 3 involved tables to fix any underlying statistics issue without first checking the query plan. Just after analyze run, issue got fix.
Now we tried to recreate the issue from backup few hours older than issue time but there query plan is same before and after analyze. We are unable to find RCA.
System Details: vCPU:8 RAM: 61 GB Mode: slave/read replica
Table details resource_type: 174 rows roles: 290026 rows user_role_mapping: 3332118 rows
Query with plan from prod copy machine(for RCA purpose): EXPLAIN (ANALYZE TRUE, VERBOSE TRUE, COSTS TRUE, BUFFERS TRUE, TIMING TRUE) SELECT * FROM user_role_mapping u WHERE u.user_profile_id = '44130444' AND u.service_name = 'GLOBAL' AND u.status = 'ACTIVE' AND EXISTS (SELECT * FROM roles r WHERE r.id=u.role_id AND r.role_name='Owner') AND EXISTS (SELECT * FROM resource_type_dummy r WHERE r.id=u.resource_type_id AND r.type='HOTEL');
EXPLAIN (ANALYZE TRUE, VERBOSE TRUE, COSTS TRUE, BUFFERS TRUE, TIMING TRUE) SELECT * FROM user_role_mapping u WHERE u.user_profile_id = 'x1' AND u.service_name = 'GLOBAL' AND u.status = 'ACTIVE' AND EXISTS (SELECT * FROM roles r WHERE r.id=u.role_id AND r.role_name='Owner') AND EXISTS (SELECT * FROM resource_type r WHERE r.id=u.resource_type_id AND r.type='HOTEL');
QUERY PLAN
Nested Loop (cost=1.00..424.35 rows=1 width=361) (actual time=368.512..369.536 rows=1 loops=1) Output: u.id, u.status, u.created_by, u.created_at, u.modified_by, u.last_modified_at, u.service_name, u.crs_id, u.identity_id, u.role_id, u.resource_id, u.resource_type_id, u.user_profile_id Inner Unique: true Buffers: shared hit=3 read=9 I/O Timings: read=369.368 -> Nested Loop (cost=0.57..415.82 rows=1 width=361) (actual time=281.798..282.821 rows=1 loops=1) Output: u.id, u.status, u.created_by, u.created_at, u.modified_by, u.last_modified_at, u.service_name, u.crs_id, u.identity_id, u.role_id, u.resource_id, u.resource_type_id, u.user_profile_id Join Filter: (u.resource_type_id = r_1.id) Buffers: shared hit=1 read=7 I/O Timings: read=282.696 -> Index Scan using type_index_resource_type on public.resource_type r_1 (cost=0.14..8.16 rows=1 width=16) (actual time=0.054..0.055 rows=1 loops=1) Output: r_1.id, r_1.status, r_1.created_by, r_1.created_at, r_1.modified_by, r_1.last_modified_at, r_1.service_name, r_1.crs_id, r_1.type, r_1.actions Index Cond: ((r_1.type)::text = 'HOTEL'::text) Buffers: shared hit=1 read=1 I/O Timings: read=0.016 -> Index Scan using user_profile_id_index_user_role_mapping on public.user_role_mapping u (cost=0.43..407.19 rows=37 width=361) (actual time=281.739..282.759 rows=1 loops=1) Output: u.id, u.status, u.created_by, u.created_at, u.modified_by, u.last_modified_at, u.service_name, u.crs_id, u.identity_id, u.role_id, u.resource_id, u.resource_type_id, u.user_profile_id Index Cond: ((u.user_profile_id)::text = 'x1'::text) Filter: (((u.service_name)::text = 'GLOBAL'::text) AND ((u.status)::text = 'ACTIVE'::text)) Rows Removed by Filter: 2 Buffers: shared read=6 I/O Timings: read=282.680 -> Index Scan using pk_roles on public.roles r (cost=0.42..8.44 rows=1 width=16) (actual time=86.708..86.708 rows=1 loops=1) Output: r.id, r.status, r.created_by, r.created_at, r.modified_by, r.last_modified_at, r.service_name, r.crs_id, r.role_name, r.team_name, r.access_category, r.crs_team_id, r.actions Index Cond: (r.id = u.role_id) Filter: ((r.role_name)::text = 'Owner'::text) Buffers: shared hit=2 read=2 I/O Timings: read=86.672 Planning Time: 0.439 ms Execution Time: 369.570 ms (30 rows)
EXPLAIN (ANALYZE TRUE, VERBOSE TRUE, COSTS TRUE, BUFFERS TRUE, TIMING TRUE) SELECT CAST(urm.id AS VARCHAR), urm.status, urm.created_by, urm.created_at, CAST(urm.modified_by AS VARCHAR), urm.last_modified_at, urm.service_name, urm.crs_id, urm.identity_id, CAST(urm.role_id AS VARCHAR), urm.resource_id, CAST(urm.resource_type_id AS VARCHAR), urm.user_profile_id, r.role_name role_name, rt.type resource_type_name FROM user_role_mapping urm INNER JOIN (SELECT id, role_name FROM roles r WHERE r.role_name IN ('xa', 'xb', 'xc', 'xd'))r ON urm.role_id = r.id INNER JOIN (SELECT id, rt.type FROM resource_type rt WHERE rt.type = 'experiment')rt ON urm.resource_type_id = rt.id WHERE (urm.user_profile_id IN ('11', '12')) AND urm.service_name = 'CRS' AND urm.status = 'ACTIVE';
Nested Loop (cost=1.00..830.89 rows=1 width=271) (actual time=240.293..240.294 rows=0 loops=1) Output: (urm.id)::character varying, urm.status, urm.created_by, urm.created_at, (urm.modified_by)::character varying, urm.last_modified_at, urm.service_name, urm.crs_id, urm.identity_id, (urm.role_id)::character varying, urm.resource_id, (urm.resource_type_id)::character varying, urm.user_profile_id, r.role_name, rt.type Inner Unique: true Buffers: shared hit=8 read=5 I/O Timings: read=240.145 -> Nested Loop (cost=0.57..822.43 rows=1 width=379) (actual time=240.292..240.293 rows=0 loops=1) Output: urm.id, urm.status, urm.created_by, urm.created_at, urm.modified_by, urm.last_modified_at, urm.service_name, urm.crs_id, urm.identity_id, urm.role_id, urm.resource_id, urm.resource_type_id, urm.user_profile_id, rt.type Join Filter: (urm.resource_type_id = rt.id) Rows Removed by Join Filter: 2 Buffers: shared hit=8 read=5 I/O Timings: read=240.145 -> Index Scan using type_index_resource_type on public.resource_type rt (cost=0.14..8.16 rows=1 width=34) (actual time=0.018..0.019 rows=1 loops=1) Output: rt.id, rt.status, rt.created_by, rt.created_at, rt.modified_by, rt.last_modified_at, rt.service_name, rt.crs_id, rt.type, rt.actions Index Cond: ((rt.type)::text = 'experiment'::text) Buffers: shared hit=2 -> Index Scan using user_profile_id_index_user_role_mapping on public.user_role_mapping urm (cost=0.43..814.21 rows=4 width=361) (actual time=138.694..240.267 rows=2 loops=1) Output: urm.id, urm.status, urm.created_by, urm.created_at, urm.modified_by, urm.last_modified_at, urm.service_name, urm.crs_id, urm.identity_id, urm.role_id, urm.resource_id, urm.resource_type_id, urm.user_profile_id Index Cond: ((urm.user_profile_id)::text = ANY ('{11,12}'::text[])) Filter: (((urm.service_name)::text = 'CRS'::text) AND ((urm.status)::text = 'ACTIVE'::text)) Buffers: shared hit=6 read=5 I/O Timings: read=240.145 -> Index Scan using pk_roles on public.roles r (cost=0.42..8.45 rows=1 width=32) (never executed) Output: r.id, r.status, r.created_by, r.created_at, r.modified_by, r.last_modified_at, r.service_name, r.crs_id, r.role_name, r.team_name, r.access_category, r.crs_team_id, r.actions Index Cond: (r.id = urm.role_id) Filter: ((r.role_name)::text = ANY ('{xa, xb, xc, xd}'::text[])) Planning Time: 0.860 ms Execution Time: 240.358 ms (27 rows)
Any other directions to look?