STW time is longer than times in SafepointStatistics

Question

Background

In our Presto service, we found that real time was longer than the sum of user and sys times. For details, refer to the previous question：which is G1 young STW time？
The Presto service is run on the k8s pod, we don't find the root cause. We suspect that servic may be a lack of cpu， so we increase the memory quota, and modify the jvm config to print STW time and safepoint statistics.

-XX:+PrintGCApplicationStoppedTime
-XX:+PrintSafepointStatistics 
-XX:PrintSafepointStatisticsCount=1

New Problem

After that, the Presto service encountered many puases，and the STW records are as follows

2022-11-10T17:23:14.851+0800: 7689.689: Total time for which application threads were stopped: 0.0026007 seconds, Stopping threads took: 0.0002632 seconds
2022-11-10T17:23:40.160+0800: 7714.999: Total time for which application threads were stopped: 21.8407322 seconds, Stopping threads took: 0.0002557 seconds
2022-11-10T17:23:40.164+0800: 7715.002: Total time for which application threads were stopped: 0.0025454 seconds, Stopping threads took: 0.0004116 seconds

Related safepoint statistics are as follows

         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
7693.158: RevokeBias                       [     868          0              0    ]      [     0     0     0     2     0    ]  0

The times for each phase in safepiint is extremely short and almost equal to zero. But the service stopped for 21 secs.
Has anyone else encountered the same problem? Waiting for help

Lakc of cpu, or the system problem? I have no ideas,

STW time is longer than times in SafepointStatistics

Background

New Problem

0 Answers0