Questions tagged [tcmalloc]

TCMalloc is a malloc library developed by Google. It is faster than the glibc 2.3 malloc (ptmalloc2), which takes approximately 300ns to execute a malloc/free pair on a 2.8GHz P4 (for small objects). TCMalloc takes approximately 50ns for the same operation pair. It also reduces lock contention for multi-threaded programs. For small objects, there is virtually zero contention. Another benefit is space-efficient representation of small objects.

Introduction

TCMalloc (Thread-Caching malloc) is a (memory allocation) library developed by Google. It is part of the gperftools (Google Performance Tools) project. Other tools in the same project include a heap checker (detecting memory leaks), a heap profiler (getting statistics for memory usage) and a CPU profiler (getting statistics for CPU usage).

Official Introduction by Sanjay Ghemawat

TCMalloc is faster than the glibc 2.3 malloc (available as a separate library called ptmalloc2) and other mallocs that I have tested. ptmalloc2 takes approximately 300 nanoseconds to execute a malloc/free pair on a 2.8 GHz P4 (for small objects). The TCMalloc implementation takes approximately 50 nanoseconds for the same operation pair. Speed is important for a malloc implementation because if malloc is not fast enough, application writers are inclined to write their own custom free lists on top of malloc. This can lead to extra complexity, and more memory usage unless the application writer is very careful to appropriately size the free lists and scavenge idle objects out of the free list.

TCMalloc also reduces lock contention for multi-threaded programs. For small objects, there is virtually zero contention. For large objects, TCMalloc tries to use fine grained and efficient spinlocks. ptmalloc2 also reduces lock contention by using per-thread arenas but there is a big problem with ptmalloc2's use of per-thread arenas. In ptmalloc2 memory can never move from one arena to another. This can lead to huge amounts of wasted space. For example, in one Google application, the first phase would allocate approximately 300MB of memory for its URL canonicalization data structures. When the first phase finished, a second phase would be started in the same address space. If this second phase was assigned a different arena than the one used by the first phase, this phase would not reuse any of the memory left after the first phase and would add another 300MB to the address space. Similar memory blowup problems were also noticed in other applications.

Another benefit of TCMalloc is space-efficient representation of small objects. For example, N 8-byte objects can be allocated while using space approximately 8N * 1.01 bytes. I.e., a one-percent space overhead. ptmalloc2 uses a four-byte header for each object and (I think) rounds up the size to a multiple of 8 bytes and ends up using 16N bytes.

Links

Related Tags

98 questions
2
votes
2 answers

What scratch buffer means in glibc?

I found that below codes makes heap leak if I check it with tcmalloc heap checker with draconian mode but the leak is not found with LSan (I assume that internal allocation in glibc is suppressed in LSan) #include #include…
hyuk myeong
  • 197
  • 1
  • 13
2
votes
3 answers

Compiling Python 2.6.6 and need for external packages wxPython, setuptools, etc... in Ubuntu

I compiled Python 2.6.6 with google-perf tools (tcmalloc) library to eliminate some of the memory issues I was having with the default 2.6.5. After getting 2.6.6 going it seems to not work becuase I think having issues with the default 2.6.5 install…
J Spen
  • 2,614
  • 4
  • 26
  • 41
2
votes
0 answers

tcmalloc does not override aligned_alloc?

Just dropped-in the tcmalloc 2.7, but for some reason my new which calls aligned_alloc still goes to glibc. I've added following to the compiler options -fno-builtin-memalign -fno-builtin-aligned_alloc -fno-builtin-malloc -fno-builtin-calloc…
kreuzerkrieg
  • 3,009
  • 3
  • 28
  • 59
2
votes
1 answer

conflict in symbols exposed by tcmalloc and glibc

I was recently debugging a crash in a product and identified the cause to be a conflict in the memory allocation symbols exposed by glibc and tcmalloc. I wrote the following sample code for exposing this issue: #include #include…
Rahul
  • 963
  • 9
  • 14
2
votes
0 answers

Multiple definition error using GFLAGS with tcmalloc

~/common/lib/libglog.a(libglog_la-utilities.o): In function `google::LogMessageVoidify::LogMessageVoidify()': ~/glog/glog-0.3.2/src/utilities.cc:80: multiple definition of…
2
votes
4 answers

Windows tcmalloc replacement with static linking

A C++ project encounter the memory fragmentation problem, and tried following: nedmalloc- Did not pass the stress test (crashed after 15 hrs), that means it works in the most of cases but not the all. And more memory usage than other…
Jun Wan
  • 21
  • 1
  • 3
2
votes
1 answer

Memory not released by python cherrypy application on linux

I have a long running process that will fetch 100k rows from the db genrate a web page and then release all the small objets (list, tuples and dicts). On windows, after each request the memory is freed. Howerver, on linux, the memory of the server…
Sad
  • 43
  • 3
2
votes
2 answers

Undefined symbols for architecture x86_64: _memalign: TCMalloc

I have made some changes and I am trying to compile google-perf(TCMalloc) on Mac OS X Yosemite 10.10.3, I followed step return here Install gperf. But, I am getting below linking error. ./autogen.sh basically autoreconf -i -> successful no…
eswaat
  • 733
  • 1
  • 13
  • 31
2
votes
0 answers

program deadlock involving __unregister_atfork & TCMalloc

Consider the following C++ program. I expect that the first thread to invoke exit will terminate the program. This is what happens when I compile it with g++ -g test.cxx -lpthread. However, when I link against TCMalloc (g++ -g test.cxx -lpthread…
Josh Johnson
  • 8,832
  • 4
  • 25
  • 31
2
votes
2 answers

Globally use Google's malloc?

I'd like to experiment with Google's tcmalloc on Linux... I have a huge project here, with hundreds of qmake generated Makefile's... I'd like to find a way to get gcc to globally link against tcmalloc (like it does with libc)... Is this possible? Or…
dicroce
  • 45,396
  • 28
  • 101
  • 140
1
vote
2 answers

Is it possible to use google tcmalloc to get per thread memory usage

Like the title says I'm interested if I can see per thread memory usage on programs compiled with -ltcmalloc. AFAIK with regular malloc memory is linked to process not to thread, but I'm not sure about tcmalloc.
NoSenseEtAl
  • 28,205
  • 28
  • 128
  • 277
1
vote
0 answers

performance problems cause by ptmalloc?

I'm testing for std::forward_list sort performance, generate 1000000 random numbers and insert to std::forward_list, sort for 5 times #include #include #include #include #include int main() { …
szh
  • 41
  • 2
1
vote
0 answers

How could I free the virtual memory after using malloc and free by tcmalloc?

In my program, I use tcmalloc for memory allocation and release. In order to keep the memory back to the memory in time, after the completion of the call, using** MallocExtension::instance()->ReleaseFreeMemory()**. However, this only ensures that…
Yale
  • 13
  • 2
1
vote
1 answer

Why should I need libprofiler.so.0

I am using google prof tools, and link my app with -lprofiler, but when I run this program: error while loading shared libraries: libprofiler.so.0: cannot open shared object file: No such file or directory on the contrary, I link with -ltcmalloc…
Shawn
  • 1,441
  • 4
  • 22
  • 36
1
vote
1 answer

Understanding TCMalloc's "Bytes released to OS (aka unmapped)" stat

I have a process that consumes a lot of memory on startup, but frees most of that memory after the process is bootstrapped. I see the following in the TCMalloc stats printed afterwards: #012MALLOC: 16635888 ( 15.9 MiB) Bytes in use by…
Nizbel99
  • 123
  • 6