Why and how does Linux kernel create a sparse file for the coredump?

Question

Currently I'm working on Linux 4.0.6. After a coredump activation, I observe that the generated corefiles is created as sparse files. For example, the ls command shows the size of my corefile is 42 MB. However, the du command shows that it allocates only 26.3 MB.

My questions regarding this observations:

Why does Linux kernel create a sparse file for the coredump?
How does it work? Does it depend on the filesystem where the coredump is placed?
Can I configure the system/ kernel to prevent the coredump as a sparse file?

Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See [What topics can I ask about here](http://stackoverflow.com/help/on-topic) in the Help Center. Perhaps [Super User](http://superuser.com/) or [Unix & Linux Stack Exchange](http://unix.stackexchange.com/) would be a better place to ask. If you feel its on-topic elsewhere, then [ask for a migration](http://meta.stackoverflow.com/q/254851) — jww, Jun 12 '17 at 19:13

score 4 · Answer 1 · edited Jun 28 '21 at 10:09

4

You should just think of what a coredump file is: a mere sequential write of the memory of the process. On modern OS, the memory is not a simple sequential byte range, but can be made of multiple segments mapped at different addresses. That explains that if you try to read or write at an address outside of a mapped segment you get a Segment Violation Signal (SIGSEGV).

So at dump time, the system writes the segments in ascending order and just lseek to the beginning of each new segment thus building a sparse file.

Now for your questions:

Why does Linux kernel create a sparse file for the coredump?

The explanation is just above.

How does it work? Does it depend on the filesystem where the coredump is placed?

Not really, unless the underlying file system does not allow sparse files.

Can I configure the system/ kernel to prevent the coredump as a sparse file?

IMHO you cannot, and more you do not want to. The ls command gives you the higher memory address used by the program, while the du command gives you the total memory size used by the program, because unused addresses are not mapped and do not consume memory at all.

edited Jun 28 '21 at 10:09

0x4d45

704
1
7
18

answered Jun 12 '17 at 13:03

Serge Ballesta

143,923
11
122
252

Thanks @SergeBallesta! But, if the coredump size is small, for instance only 3 MB, then the created coredump is not a sparse file. The `ls` and the `du` commands shows the same allocation and file size. Do you have any explanation? – ywiyogo Jun 12 '17 at 13:59
The explanation seems to suggest the file size is related to the highest mapped address and that's false. The coredump has a list of all areas. It's their count which matters, not their location within the address space. In particular a program which uses smaller mappings also likely uses them fully, so the "real" dump size is closer to the sparse size. The real question is what's up with the interest in sparse files w.r.t coredumps. – Jun 12 '17 at 14:26
A fun fact is that while you can't configure sparse dumps away per se, you can just use dumping to pipes and write the core out however you please in the piped-to-program. But don't do this. – Jun 12 '17 at 14:27
Little clarification wrt `ls` vs `du`: `du` gives size on disk which will be smaller for sparse file. Good reply overall. – bytefire Jun 12 '17 at 14:32
>The `ls` command gives you the higher memory address used by the program, while the `du` command gives you the total memory size used by the program, because unused addresses are not mapped and do not consume memory at all. @bytefire is correct. The `du` does not give us the total memory size used by the program, since the corefile is only a file, not a binary. – ywiyogo Jun 13 '17 at 14:13
@why2: I cannot understand your last comment. At dump time, the system writes down all the mapped segments, and skip over the unmapped addresses. Because of the sparse file, those unmapped portions uses no space on disk, while the mapped portions do. So the used size reported by `du` is (more or less because the disk chunk size may be different form the memory page size) the total memory used by the program at the time of dump. – Serge Ballesta Jun 13 '17 at 14:34
Sorry for the confusion @SergeBallesta. Since we use `ls` and `du` referring to a (core) file or the folder where the cores are placed, I wrongly interpreted your sentences. I agree with you that the resulting numbers from both commands represent the total memory addresses (in case of `ls`) or the mapped addresses (in case of `du`) of the crashed program during the time of dump. Thanks! – ywiyogo Jun 14 '17 at 07:19

Why and how does Linux kernel create a sparse file for the coredump?

1 Answers1