3

At the http://goog-perftools.sourceforge.net/doc/tcmalloc.html it is stated: "CMalloc currently does not return any memory to the system." I presume that it means that if I allocate 42 mb and free it system wont get it back, but next time i allocate 47 MB it will steal only 5 MB more? My question what happens with loaded dll or .so modules. Do they get their own chunk of memory that is not released until program exits. I ask because if I want to write run time updateable sw I must load new dlls without exiting the program. So my question is: if i use -ltcmalloc and I'm constantly loading and unloading dlls that allocate and free memory will that cause mem usage to explode? I presume it is a stupid question, but I don't know if each dll uses its own memory allocation stuff or if if the mem allocation is on per process level.

NoSenseEtAl
  • 28,205
  • 28
  • 128
  • 277
  • I would be cautious about using tcmalloc. It has **much** higher memory overhead than a "normal" malloc, and a lot of the claims in the document (like the claimed costs of locks) seem out-of-touch with reality. Unless you're using >2 cores (and probably 8+ cores) and keeping them all loaded with malloc-bound code, I doubt tcmalloc will be worth the costs. (And being malloc-bound is usually indicative of bad code...) – R.. GitHub STOP HELPING ICE May 12 '11 at 12:28
  • It is google code, so it must be good. Seriously I doubt that they are lying about numbers... If you have some high perf malloc heavy code you can try it and test it( I have none at the moment). – NoSenseEtAl May 12 '11 at 13:26
  • Their claim that a lock/unlock cycle costs 100ns on a high-end Xeon is rather dubious, being that it takes less than half that on my humble Atom. As for the benchmarks I believe they're overall correct and honest, but possibly irrelevant unless your program has tons of threads (and tons of cores) and is doing nothing but calling `malloc`. – R.. GitHub STOP HELPING ICE May 12 '11 at 14:52
  • Xeon and Atom are totally different architectures... also maybe the number of cores makes lock unlock slower... Again I'm not HW expert but I understand your point. It's similar to the OC ram by 20% and getting 2% faster PC. Same thing here. Not even magic instant malloc can improve performance a lot in some cases. – NoSenseEtAl May 12 '11 at 19:50
  • I'd go so far as to say that if the time spent in `malloc` is a bottleneck, you have much bigger design problems you need to address. It probably means your data is spread out across lots of tiny individually-allocated objects, in which case loss of locality, cache overflow, and even swapping to disk are likely to be much bigger performance issues. That's not to say it's not a real-world issue though. Some OO GUI apps (KDE, I believe, included) are notorious for making millions of tiny allocations... – R.. GitHub STOP HELPING ICE May 12 '11 at 20:53

1 Answers1

3

Memory belongs to a process, not to DLLs. So memory will normally be held onto until the process ends. This is a common feature of most malloc implementations, not just the one you are asking about.

  • 1
    OK, cool, I was mistaken, I thought that free in normal implementation returns memory to the OS. Sorry for stupid question, but at least I hope that people reading this will learn something. – NoSenseEtAl May 12 '11 at 10:23
  • Most `malloc` implementations will return all large chunks (>~128k) to the system immediately, and will also return large contiguous ranges at the top of the heap. I don't think tcmalloc does either but it may do the former.. – R.. GitHub STOP HELPING ICE May 12 '11 at 12:25