20

I'm using valgrind (v3.10.0) to hunt down a memory leak in a complex application (a heavily modified build of net-snmp) that is being built as part of a bigger software suite. I am sure there is a leak (the memory footprint of the application grows linearly without bound), but valgrind always reports the following upon termination.

==1139== HEAP SUMMARY:
==1139==     in use at exit: 0 bytes in 0 blocks
==1139==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1139== 
==1139== All heap blocks were freed -- no leaks are possible

The total heap usage cannot be zero -- there are many, many calls to malloc and free throughout the application. Valgrind is still capable of finding "Invalid Write" errors.

The application in question is being compiled, along with other software packages, with a uclibc-gcc toolchain for the MIPS processor (uclibc v0.9.29) to be flashed onto an embedded device running a busybox (v1.17.2) linux shell. I am running valgrind directly on the device. I use the following options when launching Valgrind:

--tool=memcheck --leak-check=full --undef-value-errors=no --trace-children=yes

Basically, Valgrind doesn't detect any heap usage even though I've used the heap. Why might this be? Are any of my assumptions (below) wrong?


What I've Tried

Simple Test Program

I compiled the simple test program (using the same target and toolchain as the application above) from the Valgrind quick-start tutorial, to see if Valgrind would detect the leak. The final output was the same as above: no heap usage.

Linking Issues?

Valgrind documentation has the following to say on their FAQ:

If your program is statically linked, most Valgrind tools will only work well if they are able to replace certain functions, such as malloc, with their own versions. By default, statically linked malloc functions are not replaced. A key indicator of this is if Memcheck says "All heap blocks were freed -- no leaks are possible".

The above sounds exactly like my problem, so I checked to see that it's dynamically linked to the C libraries that contained malloc and free. I used the uclibc toolchain's custom ldd executable (I can't use the native linux ldd) and the output included the following lines:

libc.so.0 => not found (0x00000000)
/lib/ld-uClibc.so.0 => /lib/ld-uClibc.so.0 (0x00000000)

(The reason they're not found is because I'm running this on the x86 host device; the mips target device doesn't have an ldd executable.) Based on my understanding, malloc and free will be in one of these libraries, and they seem to be dynamically linked. I also did readelf and nm on the executable to confirm that the references to malloc and free are undefined (which is characteristic of a dynamically linked executable).

Additionally, I tried launching Valgrind with the --soname-synonyms=somalloc=NONE option as suggested by the FAQ.

LD_PRELOAD support?

As pointed out by commenters and answerers, Valgrind depends upon usage of LD_PRELOAD. It has been suggested that my toolchain doesn't support this feature. In order to confirm that it does, I followed this example to create a simple test library and load it (I replaced rand() with a function that just returns 42). The test worked, so it would seem that my target supports LD_PRELOAD just fine.

Elf Data

I'll also include some information from the readelf command which may be useful. Rather than a giant dump, I've trimmed things down to include only what may be relevant.

Dynamic section
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libnetsnmpagent.so.30]
 0x00000001 (NEEDED)                     Shared library: [libnetsnmpmibs.so.30]
 0x00000001 (NEEDED)                     Shared library: [libnetsnmp.so.30]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.0]
 0x0000000f (RPATH)                      Library rpath: [//lib]

Symbol table '.dynsym'
   Num:    Value  Size Type    Bind   Vis      Ndx Name
    27: 00404a40     0 FUNC    GLOBAL DEFAULT  UND free
    97: 00404690     0 FUNC    GLOBAL DEFAULT  UND malloc
Woodrow Barlow
  • 8,477
  • 3
  • 48
  • 86
  • 2
    did you give the **--trace-children=yes** option? cause if you use **exec**, you must put that option – yakoudbz Sep 30 '14 at 16:24
  • @yakoudbz i wasn't originally using that option, but i am now, and the outcome is unchanged. thank you for the advice. i've edited the post to show which options i'm using. – Woodrow Barlow Sep 30 '14 at 16:30
  • Are you sure the memory used does not grow because of the forks, but because of memory leaks? – yakoudbz Sep 30 '14 at 16:37
  • @yakoudbz that's a possibility, but i don't think the heap usage should be zero regardless. – Woodrow Barlow Sep 30 '14 at 16:38
  • 1
    I recall a similar issue where uClibC was not built with LD_PRELOAD support, which Valgrind depends on. Can you test if that's your problem? If so, enabling LD_PRELOAD support when building uClibC should do the trick. – Michael Foukarakis Sep 30 '14 at 17:37
  • @MichaelFoukarakis hmm, i'm not sure we build uclibc during the build process. thank you for the advice, i will look into it and report back. – Woodrow Barlow Sep 30 '14 at 17:55
  • @MichaelFoukarakis after quite a bit of digging, i've determined that we're using an old version of uclibc (0.9.29) in which LD_PRELOAD is permanently enabled and can't be turned off (as best as I can tell). – Woodrow Barlow Oct 02 '14 at 16:21
  • 1
    A possible reason might be that the redirection mechanism does not redirect the malloc calls to the valgrind interception, due to the uclibc soname not being the expected name. If that is the case, use --soname-synonyms=somalloc=xxxxxx where xxxxxx is the soname of the uclibc library – phd Oct 02 '14 at 19:55
  • Can you add the output of `ldd $executable` to your question to confirm you aren't statically linking uclibc? – b4hand Oct 03 '14 at 02:55
  • @b4hand I've made an edit to the post. The results left me confused. – Woodrow Barlow Oct 03 '14 at 12:15
  • @phd I've tried setting the soname to "NONE", which, according to [this mailing list post](http://valgrind.10908.n7.nabble.com/malloc-a-new-lib-not-in-uclibc-and-we-could-not-use-valgrind-could-you-give-me-some-advice-td33358.html), should allow substitution with whatever malloc library is used (as best as I can understand). To my knowledge, though, uclibc uses the typical name for standard libraries. – Woodrow Barlow Oct 03 '14 at 12:20
  • apparently you're running on the wrong architecture : http://stackoverflow.com/questions/16807560/ldd-doesnt-work-on-dynamically-linked-binary, please try to install all of the necessary compatibility-libraries and run ldd again – specializt Oct 03 '14 at 12:22
  • 2
    Did you try creating a dummy program which you are even more sure creates a leak, just to validate that Valgrind is not able to see that, either? I notice that Valgrind support for MIP32 is very new, perhaps there are issues. I do expect quality from Valgrind however, so that seems unlikely. – unwind Oct 03 '14 at 12:22
  • @phd in fact, according to the uclibc readme file, uclibc actually appears as a gnu libc library to applications: "there is an unwholesomely huge amount of code out there that depends on the presence of gnu libc header files. we have gnu libc compatible header files. [...] we lie and claim to be gnu libc in order to force these applications to work as their developers intended." – Woodrow Barlow Oct 03 '14 at 12:23
  • @unwind creating a new application in our architecture is not trivial. i have put an intentional leak in the application i'm trying to test, and i have tried testing other applications. i always get a result that says zero heap usage. – Woodrow Barlow Oct 03 '14 at 12:38
  • @WoodrowBarlow OK. This seems gnarly and annoying, I hope you can resolve it. – unwind Oct 03 '14 at 12:41
  • since you seem to ignore resolution-attempts i shall now let you bathe in cluelessness. – specializt Oct 03 '14 at 12:59
  • @specializt I was reading up on the `ldd` command. See my most recent edit, I was typing it as you commented. – Woodrow Barlow Oct 03 '14 at 13:03
  • Use your cross compiled ldd from uclibc, that's what the faq to which you linked says will work. You could still have **a shared library** while statically linking to uclibc. – b4hand Oct 03 '14 at 15:33
  • Also what version of valgrind are you using? MIPS support was added in 3.8.0. – b4hand Oct 03 '14 at 15:35
  • @b4hand I'm using v3.10.0. Will update with results of `ldd` momentarily. – Woodrow Barlow Oct 03 '14 at 16:41
  • @unwind i did end up creating a dummy program. i copied the exact test program from [here](http://valgrind.org/docs/manual/quick-start.html). the valgrind output was the same as i posted in my question (no leaks detected). – Woodrow Barlow Oct 03 '14 at 18:05
  • Could you post the output of valgrind with the `-v --trace-redir=yes` option? – Thomas Oct 14 '14 at 20:13
  • Have you considered trying [Boehm GC](http://en.wikipedia.org/wiki/Boehm_garbage_collector)? It can be used for [leak detection](http://www.hboehm.info/gc/leak.html). – Elliott Frisch Oct 17 '14 at 19:27

3 Answers3

11

First, let's do a real test to see whether something is statically linked.

$ ldd -v /bin/true
    linux-vdso.so.1 =>  (0x00007fffdc502000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0731e11000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f07321ec000)

    Version information:
    /bin/true:
        libc.so.6 (GLIBC_2.3) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.3.4) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.4) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
    /lib/x86_64-linux-gnu/libc.so.6:
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2

The second line in the output shows it is dynamically linked to libc, which is what contains malloc.

As for what might be going wrong, I can suggest four things:

  1. Perhaps it's not linked to normal libc, but to some other C library (e.g. uclibc) or something else valgrind is not expecting. The above test will show you exactly what it's linked to. In order for valgrind to work, it uses LD_PRELOAD to wrap the malloc() and free() functions (description of general function wrapping here). If your libc substitute doesn't support LD_PRELOAD or (somehow) the C library's malloc() and free() aren't being used at all (with those names), then valgrind is not going to work. Perhaps you could include the link line used when you build your application.

  2. It is leaking, but it's not allocating memory using malloc(). For instance, it might (unlikely) be doing its own calls to brk(), or (more likely) allocating memory with mmap. You can use this to find out (this was a dump of cat itself).

.

$ cat /proc/PIDNUMBERHERE/maps
00400000-0040b000 r-xp 00000000 08:01 805303                             /bin/cat
0060a000-0060b000 r--p 0000a000 08:01 805303                             /bin/cat
0060b000-0060c000 rw-p 0000b000 08:01 805303                             /bin/cat
02039000-0205a000 rw-p 00000000 00:00 0                                  [heap]
7fbc8f418000-7fbc8f6e4000 r--p 00000000 08:01 1179774                    /usr/lib/locale/locale-archive
7fbc8f6e4000-7fbc8f899000 r-xp 00000000 08:01 1573024                    /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8f899000-7fbc8fa98000 ---p 001b5000 08:01 1573024                    /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8fa98000-7fbc8fa9c000 r--p 001b4000 08:01 1573024                    /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8fa9c000-7fbc8fa9e000 rw-p 001b8000 08:01 1573024                    /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8fa9e000-7fbc8faa3000 rw-p 00000000 00:00 0
7fbc8faa3000-7fbc8fac5000 r-xp 00000000 08:01 1594541                    /lib/x86_64-linux-gnu/ld-2.15.so
7fbc8fca6000-7fbc8fca9000 rw-p 00000000 00:00 0
7fbc8fcc3000-7fbc8fcc5000 rw-p 00000000 00:00 0
7fbc8fcc5000-7fbc8fcc6000 r--p 00022000 08:01 1594541                    /lib/x86_64-linux-gnu/ld-2.15.so
7fbc8fcc6000-7fbc8fcc8000 rw-p 00023000 08:01 1594541                    /lib/x86_64-linux-gnu/ld-2.15.so
7fffe1674000-7fffe1695000 rw-p 00000000 00:00 0                          [stack]
7fffe178d000-7fffe178f000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Note whether the end address of [heap] is actually growing, or whether you are seeing additional mmap entries. Another good indicator of whether valgrind is working is to send a SIGSEGV or similar to the process and see whether you see heap in use on exit.

  1. It isn't leaking in the strict sense, but it is leaking to all intents and purposes. For instance, perhaps it has datastructure (like a cache), which grows over time. On exit, the program (correctly) frees all entries in the cache. So, on exit, nothing is in use on the heap. In this instance, you'll want to know what is growing. This is a harder proposition. I'd use the technique to kill the program (above), capture the output, and post-process it. If you see 500 things after 24 hours, 1,000 after 48 hours, and 1,500 after 72 hours, that should give you an indication of what is 'leaking'. However, as haris points out in the comments, whilst this would result in the memory not being shown as leaks, it doesn't explain the 'total heap usage' being zero, as this describes the total allocations made and freed.

  2. Perhaps valgrind is just not working on your platform. What happens if you build a very simple program like the one below, and run valgrind on it on your platform? If this isn't working, you need to find out why valgrind is not operating right. Note that valgrind on MIPS is pretty new. Here is an email thread where a developer with MIPS and uclibc discovers valgrind is not reporting any allocations. His solution is to replace ntpl with linuxthreads.

.

#include <stdio.h>
#include <stdlib.h>
int
main (int argc, char **argv)
{
  void *p = malloc (100);       /* does not leak */
  void *q = malloc (100);       /* leaks */
  free (p);
  exit (0);
}
abligh
  • 24,573
  • 4
  • 47
  • 84
  • `It isn't leaking in the strict sense, but it is leaking to all intents and purposes. For instance, perhaps it has datastructure (like a cache), which grows over time.` anything that grows with time (perhaps the growth depends on the activity of the application) should be using dynamic memory allocation, then those allocations and free should have appeared in the valgrind output. correct me if i am wrong. – Haris Oct 04 '14 at 18:34
  • @haris `valgrind` does not show allocations which are freed on exit, only ones which are not freed before exit. – abligh Oct 05 '14 at 08:26
  • i tried with a simple `malloc()` and `free()`, and in heap summary it came `total heap usage: 1 allocs, 1 frees, 4 bytes allocated` – Haris Oct 05 '14 at 08:34
  • @haris - ah - on the 'total heap usage' line - yes you are correct. I will adjust my answer. – abligh Oct 05 '14 at 08:43
  • **1.** the executable is linked dynamically to both lic.so and uclibc.so. i believe, but i'm not 100% sure, that my uclibc build has LD_PRELOAD support... but i've also tried setting the soname-synonyms option for valgrind, and that didn't work. **2.** there are definitely `malloc` calls in the code, so the heap shouldn't be empty even if the leak comes from `mmap` or other allocation methods. **3.** same as previous, the heap oughtn't be empty. – Woodrow Barlow Oct 06 '14 at 14:50
  • **4.** I did build a test program, I built the sample that Valgrind has on their quickstart page. It did not detect any heap usage or leaks. I am currently looking into that email thread you linked, to determine if it still applies to the version of valgrind i'm running and to determine if i can use linuxthreads instead of ntpl. thank you for your exceptionally detailed answer here, it has been very helpful in ruling things out for me and suggesting some new paths to look down. – Woodrow Barlow Oct 06 '14 at 14:53
  • @WoodrowBarlow if you are linked to both `libc.so` and `uclibc.so` that may well be the issue, as `valgrind` is only going to override one `malloc()` implementation, and being linked to two is going to confuse it. I suspect it has the potential to confuse more than `valgrind` though! – abligh Oct 06 '14 at 16:52
5

(Adding another answer as the question itself has changed substantially after OP awarded the first bounty)

Based on my understanding of your edits, you have now:

  1. Replicated the problem with valgrind's own test program
  2. Confirmed the test program binary is dynamically linked to uclibc
  3. Confirmed LD_PRELOAD is working on your system
  4. Confirmed (if only by using the test program) that this isn't symbol interference from another library

To me, that indicates that valgrind has a bug or is incompatible with your toolchain. I found references to say it should work with your tool-chain, so that implies to me there is a bug either way.

I suggest therefore that you report a bug using the mechanism described here. Perhaps leave out the bit about your complicated application, and just point out the simple test program doesn't work. If you haven't already, you might try the users mailing list as described here.

abligh
  • 24,573
  • 4
  • 47
  • 84
  • thank you for your answer. i find it more likely that i'm misunderstanding one of my preconceptions than that there is a fundamental bug; nevertheless, i do have an open message on the mailing list, and if that fails to uncover any misconceptions on my part, i shall try filing a bug report. – Woodrow Barlow Oct 13 '14 at 17:41
1

In order to confirm that the executable is not statically linked, I ran file snmpd

Your problem is most likely not that the binary is statically linked (you now know it is not), but that malloc and free are statically linked into it (perhaps you are using alternative malloc implementation, such as tcmalloc?).

When you built the simple test case (on which Valgrind worked correctly), you likely didn't use the same link command line (and the same libraries) as your real application does.

In any case, it is trivial to check:

readelf -Ws snmpd | grep ' malloc'

If this shows UND (i.e. undefined), the Valgrind should have no trouble intercepting it. But chances are it shows FUNC GLOBAL DEFAULT ... malloc instead, which means that your snmpd is as good as statically linked as far as valgrind is concerned.

Assuming my guess is correct, relink snmpd with -Wl,-y,malloc flag. That will tell you which library defines your malloc. Remove it from the link, find and fix the leak, then decide whether having that library is worth the trouble it has caused you.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • The output says both "GLOBAL DEFAULT" and "UND". `97: 00404690 0 FUNC GLOBAL DEFAULT UND malloc` and `89: 00404690 0 FUNC GLOBAL DEFAULT UND malloc` – Woodrow Barlow Oct 06 '14 at 12:13
  • follow-up: i ran the `nm` command, and references to malloc and free are marked "U", so they're undefined. – Woodrow Barlow Oct 06 '14 at 14:45
  • @WoodrowBarlow Can you also run `readelf` on the simple test case? The fact that `readelf` shows both non-`0` address and `UND` for `malloc` is surprising to me. – Employed Russian Oct 06 '14 at 14:48
  • Sure, the output is basically the same: `14: 004007f0 0 FUNC GLOBAL DEFAULT UND malloc` and `54: 004007f0 0 FUNC GLOBAL DEFAULT UND malloc`. The strange output is probably because I'm using readelf on an x86 machine, but the executables are MIPS executables. – Woodrow Barlow Oct 06 '14 at 14:56