Linux Allocator Does Not Release Small Chunks of Memory

Question

The Linux glibc allocator seems to be behaving weirdly. Hopefully, someone can shed some light on this. Here is the source file that I have:

first.cpp:

#include <unistd.h>
#include <stdlib.h>
#include <list>
#include <vector>

int main() {

  std::list<char*> ptrs;
  for(size_t i = 0; i < 50000; ++i) {
    ptrs.push_back( new char[1024] );
  }
  for(size_t i = 0; i < 50000; ++i) {
    delete[] ptrs.back();
    ptrs.pop_back();
  }

  ptrs.clear();

  sleep(100);

  return 0;
}

second.cpp:

#include <unistd.h>
#include <stdlib.h>
#include <list>

int main() {

  char** ptrs = new char*[50000];
  for(size_t i = 0; i < 50000; ++i) {
    ptrs[i] = new char[1024];
  }
  for(size_t i = 0; i < 50000; ++i) {
    delete[] ptrs[i];
  }
  delete[] ptrs;

  sleep(100);

  return 0;
}

I compile both:

$ g++ -o first first.cpp
$ g++ -o second second.cpp

I run first, and after it's sleeping, I see the resident memory size:

When I compile first.cpp, and run it, I look at memory with ps:

$ ./first&
$ ps aux | grep first
davidw    9393  1.3  0.3  64344 53016 pts/4    S    23:37   0:00 ./first


$ ./second&
$ ps aux | grep second
davidw    9404  1.0  0.0  12068  1024 pts/4    S    23:38   0:00 ./second

Notice the resident memory size. In first, the resident memory size is 53016k. in second, it is 1024k. First never released the allocations back to the kernel for some reason or another.

Why does the first program not relinquish memory to the kernel, but the second program does? I understand that the first program uses a linked list and the linked list probably allocates some nodes on the same page as the data we're freeing. However, those nodes should be freed, as we're popping those nodes off, then clearing the linked list. If you run either of these programs through valgrind, it comes back with no memory leaks. What is probably happening is memory gets fragmented in first.cpp that doesn't in second.cpp. However, if all memory on a page is freed, how does that page not get relinquished back to the kernel? What does it take for memory to get relinquished back to the kernel? How can I modify first.cpp (continuing to put the char*'s in a list) so that the memory is relinquished to the kernel.

Use shrink to fit, described [here](http://stackoverflow.com/questions/5834754/stddeque-does-not-release-memory-until-program-exits). In this case, do `std::list().swap(ptrs)`. — jxh, Jun 08 '12 at 06:26
I'm afraid there is something else amiss here... Here is my new program: int main() { { std::list ptrs; for(size_t i = 0; i < 50000; ++i) { ptrs.push_back( new char[1024] ); } for(size_t i = 0; i < 50000; ++i) { delete[] ptrs.back(); ptrs.pop_back(); } ptrs.clear(); std::list().swap(ptrs); } sleep(100); return 0; } running ps has the same result: davidw 9961 0.0 0.3 64344 53016 pts/4 S 00:31 0:00 ./first — user1418199, Jun 08 '12 at 06:32
It was tagged C since you'll get the same problem in C with malloc/free. I was thinking someone programming in C might find this useful in the future. — user1418199, Jun 08 '12 at 06:39
Have you verified that your second program actually allocates memory? I recall reading recently about optimising away `malloc`/`free` pairs with no code in between that actually uses the result, and the same logic would apply to `new`/`delete` pairs too. — , Jun 08 '12 at 07:33
The usual configuration of Linux doesn't allocate the actual page until it has been accessed (which makes a conforming implementation of C or C++ impossible), but I don't see where that affects his code. He actually touches all of the large block he allocates, and the small blocks are smaller than a page, so the updating of the hidden data by `malloc`/`new` will touch the page they're in. — James Kanze, Jun 08 '12 at 07:55
@JamesKanze I'm not sure to whom your comment is addressed, but if it's to me, that isn't what I meant. I didn't mean the kernel wouldn't touch the memory pointed to by result of `malloc`/`new`, I meant that the compiler would sometimes removes calls to `malloc`/`new` and `free`/`delete`. — , Jun 08 '12 at 08:13
@hvd It shouldn't, at least not without full program analysis. Calls to `operator new` and `operator delete` are observable behavior in C++. — James Kanze, Jun 08 '12 at 09:19
@JamesKanze Good point, in C++ it's more complicated to verify that the optimisation is valid (even though, in this case, it is), so it's less likely that the compiler performs it. — , Jun 08 '12 at 10:27
Small allocations are pooled. What you don't see are the small list nodes that get allocated. Favor std::vector where you can, set its capacity when you can. — Hans Passant, Jun 08 '12 at 10:54
The resident memory size is just that. It doesn't tell you how much virtual memory your process has allocated, just how much of it is resident at a given time. You should be looking at VSZ -- the amount of allocated virtual memory. — Kuba hasn't forgotten Monica, Jun 08 '12 at 13:03
@hvd It's not a compiler optimization, it's a bug in the OS. It's also possible to configure Linux so that it doesn't display this bug, see the docs on `overcommit_memory` and `overcommit_ratio`. — James Kanze, Jun 11 '12 at 08:04
@JamesKanze I know about that, that was not what I'm talking about. I was talking about the *compiler* removing `malloc`/`free` pairs, so that `int main() { void *p = malloc(100); free(p); }` compiles to the exact same code as `int main() { }`. I thought it would also be performed for C++ `new`/`delete` pairs, but you correctly pointed out that in that case, the optimization isn't generally valid. — , Jun 11 '12 at 08:41
@hvd I don't think I've ever seen a compiler which removed `malloc`/`free` pairs. Typically, programmers don't use `malloc`/`free` without good reason, so the compiler couldn't remove them anyway. — James Kanze, Jun 11 '12 at 10:30
@JamesKanze I have no idea how useful it is, but at least gcc and I think also clang can do this. The idea is what there may have been code between that uses it, but other optimisations have already got rid of those, so only malloc/free remains. — , Jun 11 '12 at 11:11
@hvd I suspect that many compilers _could_ do it. I have my doubts as to whether it is worth the effort. (In the case of gcc, the compiler has a somewhat generic technique for annotating functions. It's possible that the optimization of `malloc`/`free` is simply a special case of the more generic technique, and the way the library writers have annotated the functions.) — James Kanze, Jun 11 '12 at 12:41

Jonathan Wakely · Answer 1 · 2013-06-19T21:29:59.443

This behaviour is intentional, there is a tunable threshold that glibc uses to decide whether to actually return memory to the system or whether to cache it for later reuse. In your first program you make lots of small allocations with each push_back and those small allocations are not a contiguous block and are presumably below the threshold, so don't get returned to the OS.

Calling malloc_trim(0) after clearing the list should cause glibc to immediately return the top-most region of free memory to the system (requiring a sbrk system call next time memory is needed.)

If you really need to override the default behaviour (which I wouldn't recommend unless profiling reveals it actually helps) then you should probably use strace and/or experiment with mallinfo to see what's actually happening in your program, and maybe using mallopt to adjust the threshold for returning memory to the system.

Regarding malloc_trim: Since glibc 2.8 this function frees memory in all arenas and in all chunks with whole free pages. Before glibc 2.8 this function only freed memory at the top of the heap in the main arena.(Ref: http://man7.org/linux/man-pages/man3/malloc_trim.3.html) — toddwz, Mar 11 '20 at 18:33

score 5 · Answer 2 · answered Jun 08 '12 at 07:53

5

It keeps the smaller chunks available in case you request them again. It is a simple caching optimization, and not behaviour to be concerned about.

answered Jun 08 '12 at 07:53

Puppy

144,682
38
256
465

score 3 · Answer 3 · answered Jun 08 '12 at 07:52

Typically, the memory allocated by new will only be returned to the system when the process terminates. In the second case, I suspect that libc is using a special allocator for very large continuous blocks, which does return it, but I'd be very surprised if any of your new char[1024] were returned, and on many Unices, even the large block won't be returned.

jxh · Answer 4 · 2012-06-08T09:28:35.837

(Editing down my answer, since there really isn't any issue here.)

As has been noted, there isn't really an issue here. Johnathon Wakely hits the nail on the head.

When the memory utilization is not what I expect it to be on Linux, I usually start my analysis using the mtrace tool, and analyzing the /proc/self/maps file.

mtrace is used by bracketing your code around two calls, one to starts the trace, and one that ends it.

  mtrace();
  {
      // do stuff
  }
  muntrace();

The mtrace calls are only active if the MALLOC_TRACE environment variable is set. It specifies the name of the file for the mtrace logging output. This logging output can then be analyzed for memory leaks. A command line program called mtrace can be used to analyze the output.

$ MALLOC_TRACE=mtrace.log ./a.out
$ mtrace ./a.out mtrace.log

The /proc/self/maps file provides a list of memory mapped regions in use by the current program, including anonymous regions. It can help identify regions which are particularly large, and then additional sleuthing is needed to determine what that region is associated with. Below is a simple program to dump the /proc/self/maps file to another file.

void dump_maps (const char *outfilename) {
  std::ifstream inmaps("/proc/self/maps");
  std::ofstream outf(outfilename, std::ios::out|std::ios::trunc);
  outf << inmaps.rdbuf();
}

Linux Allocator Does Not Release Small Chunks of Memory

4 Answers4

Linked

Related