Our multi-threaded server has hundreds connection threads that are responsible for IO handing and replying to the incoming requests.
There is another asynchronous thread that runs relatively heavy tasks with many allocations from time to time (say every few seconds).
Once I converted that thread to a a small thread pool (i.e. those tasks now run from different threads each time) our server usually has the same CPU usage but it can suddenly reach the state were allocations across all operations take much more time and the overall CPU usage of the server almost doubles from 2 cores to 3.7 cores.
My main theory so far is that I somehow changes access pattern for tcmalloc library and that causes random CPU lifts. What should i look at in tcmalloc stats in order to confirm this theory? Can it be that the same code running now from different threads (but not simultaneously) causes tcmalloc to allocate from the central cache more than from the thread cache?