Questions tagged [tcmalloc]

TCMalloc is a malloc library developed by Google. It is faster than the glibc 2.3 malloc (ptmalloc2), which takes approximately 300ns to execute a malloc/free pair on a 2.8GHz P4 (for small objects). TCMalloc takes approximately 50ns for the same operation pair. It also reduces lock contention for multi-threaded programs. For small objects, there is virtually zero contention. Another benefit is space-efficient representation of small objects.

Introduction

TCMalloc (Thread-Caching malloc) is a (memory allocation) library developed by Google. It is part of the gperftools (Google Performance Tools) project. Other tools in the same project include a heap checker (detecting memory leaks), a heap profiler (getting statistics for memory usage) and a CPU profiler (getting statistics for CPU usage).

Official Introduction by Sanjay Ghemawat

TCMalloc is faster than the glibc 2.3 malloc (available as a separate library called ptmalloc2) and other mallocs that I have tested. ptmalloc2 takes approximately 300 nanoseconds to execute a malloc/free pair on a 2.8 GHz P4 (for small objects). The TCMalloc implementation takes approximately 50 nanoseconds for the same operation pair. Speed is important for a malloc implementation because if malloc is not fast enough, application writers are inclined to write their own custom free lists on top of malloc. This can lead to extra complexity, and more memory usage unless the application writer is very careful to appropriately size the free lists and scavenge idle objects out of the free list.

TCMalloc also reduces lock contention for multi-threaded programs. For small objects, there is virtually zero contention. For large objects, TCMalloc tries to use fine grained and efficient spinlocks. ptmalloc2 also reduces lock contention by using per-thread arenas but there is a big problem with ptmalloc2's use of per-thread arenas. In ptmalloc2 memory can never move from one arena to another. This can lead to huge amounts of wasted space. For example, in one Google application, the first phase would allocate approximately 300MB of memory for its URL canonicalization data structures. When the first phase finished, a second phase would be started in the same address space. If this second phase was assigned a different arena than the one used by the first phase, this phase would not reuse any of the memory left after the first phase and would add another 300MB to the address space. Similar memory blowup problems were also noticed in other applications.

Another benefit of TCMalloc is space-efficient representation of small objects. For example, N 8-byte objects can be allocated while using space approximately 8N * 1.01 bytes. I.e., a one-percent space overhead. ptmalloc2 uses a four-byte header for each object and (I think) rounds up the size to a multiple of 8 bytes and ends up using 16N bytes.

Links

Related Tags

98 questions
0
votes
1 answer

Override global new/delete and malloc/free with tcmalloc library

I want to override new/delete and malloc/free. I have tcmalloc library linked in my application. My aim is to add stats. From new I am calling malloc. Below is an example it's global. void* my_malloc(size_t size, const char *file, int line, const…
eswaat
  • 733
  • 1
  • 13
  • 31
0
votes
1 answer

C/C++ using tcmalloc

I've been trying to compile my application using tcmalloc. Therefore I append, as recommended in the usage instructions, -ltcmalloc to my compiler flags. After rerunning my application I could not see any performance differences. How can I check if…
aQuip
  • 591
  • 3
  • 6
  • 22
0
votes
1 answer

TCMalloc: delete and delete[] operators patching on Win-64

TCMalloc is a great heap manager for multi-threaded use (in my case OpenMP). It was quite easy to get everything with tcmalloc up and running for linux, windows, 32 bit, but right now I am completely stuck with win-64: I use dynamically linked x64…
0
votes
2 answers

Unexpected Behaviour from tcmalloc

I have been using tcmalloc for a few months in a large project, and so far I must say that I am pretty happy about it, most of all for its HeapProfiling features which allowed to track memory leaks and remove them. In the past couple of weeks though…
BaroneAshura
  • 191
  • 1
  • 1
  • 14
0
votes
1 answer

64-bit NoBarrier_Store() not implemented on this platform

"64-bit NoBarrier_Store() not implemented on this platform" I use tcmalloc on win7 with vs2005. There is two threads in my app, one do malloc(), the other one do free().The tcmalloc print this when my app start.After debug, i find the following…
0
votes
1 answer

Why this tcmalloc error SbrkSysAllocator failed happen?

I am using tcmalloc_minimal of google-perftools as the default memory allocater in my c++ program. It print out this infomation: src/system-alloc.cc:427] SbrkSysAllocator failed And the program goes on running. Does it matter?
Treper
  • 3,539
  • 2
  • 26
  • 48
-2
votes
1 answer

how to know golang allocated variable on the heap or the stack?

i read the golang FAQ:https://go.dev/doc/faq#stack_or_heap,i want to know when golang allocate variable on stack or heap. so i write code like below : package main import ( "fmt" ) type Object struct { Field int } func main() { A :=…
-2
votes
1 answer

Memory never released when using Python classes and numpy

Basically I am not going to post all of the code here but I will provide a generic example. I have a class that has a function to run and create a large array of values. This array shouldn't be much bigger than 10MB from my estimates. Within the…
J Spen
  • 2,614
  • 4
  • 26
  • 41
1 2 3 4 5 6
7