Using the heap in a pthread allocates >100MB of RAM

Question

A simple pthread test case allocates the following RAM as measured by the VIRT column of top:

No pthread/heap usage: 4224 kB
Use pthread but no heap: 14716 kB
Use pthread and heap in main (but not thread): 23632 kB
Use pthread and heap in main and thread: 89168 kB

Why would so much RAM be allocated? valgrind --page-as-heap=yes shows peak heap allocation of 127MB.

I read that threads share a heap so why the big jump when using the heap from within the pthread? The code will later target an embedded system with very limited resources so understanding these allocations is quite important.

Compiled with the following (g++ version 5.4.0 on Ubuntu 16.04):

g++ -O2 -pthread thread.cpp

thread.cpp:

#include <iostream>
#include <pthread.h>
#include <unistd.h>

#define USE_THREAD
#define USE_HEAP_IN_MAIN
#define USE_HEAP_IN_THREAD

int *arr_thread;
int *arr_main;

void *use_heap(void *arg){
#ifdef USE_HEAP_IN_THREAD
    arr_thread=new int[10];
    arr_thread[0]=1234;
#endif
    usleep(10000000);
}

int main() {
    pthread_t t1;
#ifdef USE_HEAP_IN_MAIN
    arr_main=new int[10];
    arr_main[0]=5678;
#endif

#ifdef USE_THREAD
    pthread_create(&t1, NULL, &use_heap, NULL);
    pthread_join(t1,NULL);
#else
    usleep(10000000);
#endif

    return 0;
}

edited to use global ptrs to demonstrate the effect with -O2.

Unless you want `malloc()` to take a lock on every allocation (horrifically slow), you want a `malloc()` implementation where every thread has its own block of memory to distribute to `malloc()` calls in that thread (you still need synchronization occasionally, when an allocation `malloc()`ed in one thread is `free()`d in another). — EOF, May 17 '17 at 19:37
@EOF That makes sense so you would say this is expected behaviour? +10MB to use the heap sounds reasonable but why the +66MB jump to use the heap in the thread? — Chris Fryer, May 17 '17 at 19:52

score 0 · Answer 1 · answered May 20 '17 at 06:11

I read that threads share a heap

Correct.

so why the big jump when using the heap from within the pthread?

The GLIBC (and many other) malloc implementation uses per-thread arenas to avoid all threads contending for a single heap lock.

The very first malloc(1) in each thread will create an entire new arena (usually via mmap). The size of the arena is an implementation detail, 2 or 4 MB is not unreasonable.

RAM as measured by the VIRT column of top

This isn't a good measurement. A program can have a huge virt size, but use very little actual memory. Just try to mmap 1 GB of memory but don't touch it -- your program VIRT will go through the roof, but system available memory will not be reduced at all.

The code will later target an embedded system with very limited resources so understanding these allocations is quite important.

In that case you want better tools than top. Accurate accounting for actual memory usage on Linux is somewhat tricky. You could start by looking at /proc/$pid/smaps.

I looked at the smaps file for the process as you suggest. There is a 65536 kB entry with RSS of zero so, as you say, the process isn't using the memory. However, it is available to use - I can create a 64 MB (-1kB) array without the VIRT entry increasing. So doesn't this mean that 64 MB of system resource is allocated to the process? and, as such, that's 64 MB that can't be allocated to other processes? It's clear that different threads should be allocated different arenas but why does the pthread instance get 64 MB when the main thread only gets 8 MB? — Chris Fryer, May 20 '17 at 12:50

Using the heap in a pthread allocates >100MB of RAM

1 Answers1