5

I'm trying to iterate all the malloc_chunk in all arenas. (debugging based on core file, for memory leak and memory corruption investigation)

As i know each arena have top_chunk which point to the top chunk inside of one arena, based on top_chunk, inside of it, there's prev_size and size, based on the code (glibc/malloc/malloc.c): enter image description here I can get the previous continuous chunks, and then loop all the chunks in one arena. (i can statistic the chunks with the size and the number, which like WinDBG: !heap -stat -h) and also based on prev_size and size, i can check the chunk is corrupt or not.

In arena(malloc_state), there's a member variable: next which point to next arena. Then i can loop all the arena's chunks.

But i met a problem is if the chunk is not allocated, the prev_size is invalid, how to get the previous malloc_chunk?? Or this way is not correct.


Question Background:

The memory leak bug we have is memory leak reported in several online data node(our project is distributed storage cluster).

What we did and result:

  1. We use valrgind to reproduce the bug in test cluster, but unfortunately we get nothing.

  2. I tried to investigate more about the heap, tried to analyze the heap chunk and follow the way which i did before in WinDBG(which have very interesting heap commands to digger the memory leak and memory corruption), but i was blocked by the Question which i asked.

  3. We use valgrind-massif to analyze the allocation(which i think it's very detail and interesting, could show which allocation takes how much memory). Massif show several clues, we follow this and check code, finally found the leak(a map is very huge, and in-proper usage of it, but i would erase in holder-class's destructor, that's why valgrind not report this).

I'll digger more about the gdb-heap source code to know more about glic malloc structure.

Tongxuan Liu
  • 270
  • 1
  • 12
  • 1
    This is either a GDB question or a WinDbg question, but IMHO it cannot be both. From my unserstanding I'd suggest removing the WinDbg tag ("core dump" and "arena" don't seem like WinDbg terms to me) – Thomas Weller Jun 23 '16 at 10:46
  • yes, it's a gdb question, not a WinDbg question – Tongxuan Liu Jun 23 '16 at 11:07
  • 1
    You might be interested in the `gdb-heap` project, which includes Python code that runs in gdb and that knows how to dissect the glibc malloc arenas. – Tom Tromey Jun 23 '16 at 14:00
  • @TomTromey, yes, Tom i know this project, and tried to use it, but unfortunately runtime errors popup. i'll digger more about the source in gdb-heap later. – Tongxuan Liu Jun 24 '16 at 08:02
  • @orbitcowboy, thx, we already integrate cppcheck into our project, but not found the leak. – Tongxuan Liu Jun 24 '16 at 08:03

3 Answers3

4

The free open source program https://github.com/vmware/chap does what you want here for glibc malloc. Just grab a core (either because the core crashed or grab a lib core by using gcore or using the generate command from within gdb). Then just open the core by doing:

chap yourCoreFileName

Once you get to the chap prompt, if you want to iterate through all the chunks, both free and not, you can do any of the following, depending on the verbosity you want, but keeping in mind that an "allocation" in chap does not contain the chunk header, but rather starts at the address returned by malloc.

Try any of the following:

count allocations
summarize allocations
describe allocations
show allocations

If you only care about allocations that are currently in use try any of the following:

count used
summarize used
describe used
show used

If you only care about allocations that are leaked try any of the following:

count leaked
summarize leaked
describe leaked
show leaked

More details are available in documentation available from the github URL mentioned above.

In terms of corruption, chap does some checking at startup and reports many kinds of corruption, although the output may be a bit cryptic at times.

Tim Boddy
  • 1,019
  • 7
  • 13
2

First, before digging into the implementation details of malloc, your time may be better spent with a tool like valgrind or even run under the MALLOC_CHECK_ environment variable to let the internal heap consistency checking do the work for you.

But, since you asked....

glibc's malloc.c has some helpful comments about looking at the previous chunk.

Some particularly interesting ones are:

/* Note that we cannot even look at prev unless it is not inuse */

And:

If prev_inuse is set for any given chunk, then you CANNOT determine the size of the previous chunk, and might even get a memory addressing fault when trying to do so.

This is just a limitation of the malloc implimentation. When a previous chunk is in use, the footer that would store the size is used by the user-data of the allocation instead.

While it doesn't help your case, you can check whether a previous chunk is in use by following what the prev_inuse macro does.

#define PREV_INUSE 0x1
#define prev_inuse(p) ((p)->size & PREV_INUSE)

It checks the low-order bit of the current chunk's size. (All chunk sizes are divisible by 4 so the lower 2 bits can be used for status.) That would help you stop your iteration before going off into no-man's land.

Unfortunately, you'd still be terminating your loop early, before visiting every chunk.

If you really want to iterate over all chunks, I'd recommend that you start at malloc_state::top and follow the next_chunk until next_chunk points to the top.

Sean Cline
  • 6,979
  • 1
  • 37
  • 50
  • Thx Sean, I'm not sure i understand the way you mentioned, "If you really want to iterate over all chunks, I'd recommend that you start at malloc_state::top and follow the next_chunk until next_chunk points to the top.", the tmalloc_state::top is not the first chunk, it's the last chunk physically. And I tried to use valgrind to reproduce the bug, unfortunately, not leak reported. Bug is reported in our one data node which would run several days then memory reach the limitation. We have to take time to reproduce it. – Tongxuan Liu Jun 24 '16 at 08:04
1

Try pmap <PID> -XX command to trace down the memory usage from different aspects.

Madars Vi
  • 947
  • 9
  • 12