Memory leak that doesn't crash when OOM, or show up in massif/valgrind

Question

I have an internal C++ application that will indefinitely grow--so much so that we've had to implement logic that actually kills it once the RSS reaches a certain peak size (2.0G) just to maintain some semblance of order. However, this has shown some strange behaviors.

First, I ran the application through Valgrind w/ memcheck, and fixed some random memory leaks here and there. However, the extent of these memory leaks were measured in the 10s of megabytes. This makes sense, as it could be that there's no actual memory leaking--it could just be poor memory management on the application side.

Next, I used Valgrind w/ massif to check to see where the memory is going, and this is where it gets strange. The peak snapshot is 161M--nowhere near the 1.9G+ peaks we see using the RSS field. The largest consumption is where I'd expect--in std::string--but this is not abnormal.

Finally, and this is the most puzzling--before we were aware of this memory leak, I actually was testing this service on AWS, and just for fun, set the number of workers to a high number on a CC2.8XL machine, 44 workers. That's 60.5G of RAM, and no swap. Fast forward a month: I go to look at the host--and low and behold, it's maxed out on RAM--BUT! The processes are still running fine, and are stuck at varying stages of memory usage--almost evenly distributed from 800M to 1.9G. Every once in a while dmesg prints out an Xen error about being unable to allocate memory, but other than that, the processes never die and continue to actively process (i.e., they're not "stuck").

Is there something I'm missing here? It's basically working, but for the life of me, I can't figure out why. What would be a good recommendation on what to look for next? Are there any tools that might help me figure it out?

score 1 · Accepted Answer · answered Mar 19 '14 at 19:01

1

Note that valgrind memcheck only discovers when you "abandon" memory. while(1) vec.push_back(n++); will fill all available memory but not report any leaks. By the sounds of things, you are collecting strings somewhere that take up a lot of space. I have also worked on code that uses a lot of memory but not really leaking it [it's all in various places that valgrind is happy is not a leak!]. Sometimes you can track it down by simply adding some markers to the memory allocations, or some such, to indicate WHERE you are allocating memory.

In std:: functions, there is typically an Allocator argument. If you implement several different pools of memory, you may find where you are allocating memory.

I have also seen cases where I think that the process is having it's memory fragmented, so there are lots of little free spaces in the heap - this can happen if, for example, you create a lot of strings by adding to the size of the string.

answered Mar 19 '14 at 19:01

Mats Petersson

126,704
14
140
227

massif should tell me if that memory is ballooning, though, should it not? I.e., let's say I have some dumb loop that just appends 20 pointers to a long-lived vector, each to a new chunk of 20MB of memory. I would expect massif to show that I just allocated 400MB of memory--do you know if that's true? – Redmumba Mar 19 '14 at 19:20
It should, unless the code is using non-standard allocation (e.g. it's own allocator that uses `mmap` or similar to allocate large memory). It would be fairly easy to prove that with a 10 line piece of code, I'd say... – Mats Petersson Mar 19 '14 at 19:43
The code is not using custom allocators, and I confirmed this through the cachegrind output. I did find that vector resizing was enormous--4.5 billion calls--and may be contributing to memory allocation. – Redmumba Mar 19 '14 at 23:18
So, if you have something that allocates lots of small to medium-sized blocks, lots and lots, it is possible to fill the entire memory with "unfeasibly small blocks". Could be what you are seeing. – Mats Petersson Mar 19 '14 at 23:49
1

In other words, the entire memory is made up of a combination of small used regions, and "free" memory that is very small segments. Using separate heaps for different purposes can resolve this. – Mats Petersson Mar 19 '14 at 23:50
This was exactly what was happening (memory fragmentation). Sorry for not accepting the answer until a year later! Using jemalloc solved this. – Redmumba Mar 17 '15 at 06:42

score 0 · Answer 2 · answered Mar 17 '22 at 20:27

0

If it's a issue of fragmentation, run valgrind massif with the --pages-as-heap=yes option may confirm whether if it's fragmentation.

answered Mar 17 '22 at 20:27

Xman

1
2

1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 18 '22 at 04:18

Memory leak that doesn't crash when OOM, or show up in massif/valgrind

2 Answers2