1

gcore attaches gdb to a process, runs through most of the virtual memory areas of a process and dumps them to disk. Does this mean that every piece of anonymous virtual memory will need to be paged into memory within that process and thus increase its RSS, or is the memory paged into the gdb process? I guess it will also page in any file-backed memory as well (although I guess that shouldn't increase RSS, although it might increase RAM use through file cache).

Example from a Kubernetes environment shows RSS jumps from 304368 to 17135624 (gcore run from a worker node debug pod):

# ps auxwww | head -1
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
# ps auxwww | grep 3899524 | grep -v grep
1000650+ 3899524  0.2  0.9 17229416 304368 ?     SLsl Jun13  54:01 /opt/java/openjdk/jre/bin/java [...]
# gcore 3899524
[...]
# ps auxwww | grep 3899524 | grep -v grep
1000650+ 3899524  0.2 53.3 17229416 17135624 ?   SLsl Jun13  54:01 /opt/java/openjdk/jre/bin/java [...]

Could this be related specifically to containers/cgroups?

kgibm
  • 113
  • 5

1 Answers1

2

Writing out most of a process's memory to a file, as a core dump, requires reading it. How exactly this is implemented is operating system specific.

On most POSIX systems, a process (or maybe its fork) is told to dump core, whether by a debugger like gcore or the task dying in certain conditions. More efficient this way, otherwise the memory may need have to be copied from one address space to another.

On Linux, man core documents what is dumped. /proc/[pid]/coredump_filter contains a bitmask allowing this to be customized. Note that the default set is a few flavors of anonymous or private mappings. File mappings are excluded by default, as they can be read from the file after the task is gone.

Notice in ps output that resident set size (RSS) approaches virtual memory size (VSZ) as it dumps. I assume a java program with 16 GB and change of memory has most of that in heap, perhaps statically sized for production use. A full dump of that would touch every page, even if most of them are still untouched zeros.

This doesn't necessarily mean paging in from swap. While that could happen, the information you provided says nothing about swap.

Core dumping a process this large is a lot of overhead. Both in terms of the CPU to loop through all the memory, and storage to write it out. Avoid if possible, in favor of profiling and debugging methods with less overhead.

Linux cgroups does not change much about this. Mostly enables more specific accounting for your capacity planning. The memory usage of a specific container or systemd unit approximates the size of a core dump it could create. See also systemd-cgtop -m

John Mahowald
  • 32,050
  • 2
  • 19
  • 34