When Linux OOM Killer interrupts a process, the kernel logs usually provide enough information about the culprit's memory consumption (even it is not killed eventually). For example, when snmpd
process becomes an OOM trigger, its memory state can be found a bit later in the log by the PID=1190
:
Jul 18 02:21:26 inm-agg kernel: snmpd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Jul 18 02:21:26 inm-agg kernel: CPU: 3 PID: 1190 Comm: snmpd Kdump: loaded Not tainted 5.4.17-2102.201.3.el8uek.x86_64 #2
...
Jul 18 02:21:26 inm-agg kernel: Tasks state (memory values in pages):
Jul 18 02:21:26 inm-agg kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
...
Jul 18 02:21:26 inm-agg kernel: [ 1190] 0 1190 78491 1761 217088 0 0 snmpd
However, when the same happens to a thread of a Java application (on OpenJDK 64-Bit Server VM (build 25.372-b07, mixed mode)
in my case), the log contains a PID that doesn't correspond to any process. For example, in the following log, an Apache Cassandra's input handling thread ReadStage-150
has become an OOM trigger:
Jul 16 22:01:45 inm-agg kernel: ReadStage-150 invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
Jul 16 22:01:45 inm-agg kernel: CPU: 11 PID: 1653163 Comm: ReadStage-150 Kdump: loaded Not tainted 5.4.17-2102.201.3.el8uek.x86_64 #2
But the PID=1653163
specified in the message is not mentioned anywhere else:
$ journalctl -k -b -e | grep "1653163" | wc -l
1
and it has nothing in common with the Java process PID itself (1652432
):
Jul 16 22:01:45 inm-agg kernel: Tasks state (memory values in pages):
Jul 16 22:01:45 inm-agg kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
…
Jul 16 22:01:45 inm-agg kernel: [1652432] 0 1652432 7256008 5839621 49709056 0 0 java
So I wonder:
- Where the PID of the oom-killer message comes from?
- Why the thread is treated separately from its hosting JVM process in this case?
- If oom-killer would be configured to kill the OOM initiator, would it be possible (in theory at least) to interrupt only the culprit thread but not the JVM at whole?