How does `dtrace` probe memory allocations (Mac OS)

Question

Does anyone know what function / mechanism is dtrace using for tracking mallocs? I'm trying to profile a piece of code, which I can do with the aid of debugger and some command line scripting, i.e.:

sudo dtrace -n "pid`pgrep Mail | head -n 1`::malloc:entry { @sizes=quantize(arg0); }"

Gives me something like:

dtrace: description 'pid31411::malloc:entry ' matched 4 probes
^C

       value  ------------- Distribution ------------- count    
          -1 |                                         0        
           0 |                                         214      
           1 |                                         7        
           2 |                                         191      
           4 |                                         1054     
           8 |@@@@                                     15992    
          16 |@@@@@@@@@@@@                             44569    
          32 |@@@@@@@@@@                               37003    
          64 |@@@@                                     15426    
         128 |@@@@                                     15695    
         256 |@                                        2616     
         512 |@                                        1967     
        1024 |@                                        1891     
        2048 |@@                                       6010     
        4096 |                                         523      
        8192 |                                         43       
       16384 |                                         110      
       32768 |                                         19       
       65536 |                                         0        
      131072 |                                         69       
      262144 |                                         0

But this is really tedious for me. I was wondering how to do this programmatically, from within the code.

Robert Harris · Accepted Answer · 2018-08-10T12:28:21.987

I think you're viewing the problem the wrong way around. Your example shows a fairly sophisticated interpretation of an arbitrary argument in an arbitrary combination of process and function — being able to do that in a single line and without modifying your own program is extraordinarily powerful. Attempting to have your own code perform the same analysis makes no sense: what would you do if, e.g., you wanted a linear scale instead of a logarithmic one? Reimplement lquantize(), too?

Focus on writing the code you want and let DTrace do the profiling.

EDIT in response to the first comment.

The execution path for the example you give is extremely circuitous. Very broadly, dtrace(1) requests that the kernel modify malloc's prologue so that, on entry, a calling thread traps to the DTrace kernel module. There, the datum is aggregated within a per-cpu buffer before control is returned to the instrumented thread. At periodic intervals, the dtrace process requests, via libdtrace, a snapshot of the kernel's per-CPU buffers via ioctl(2). Coalescing these buffers and then rendering the graph that you see are also functions performed by libdtrace. On macos, the libdtrace API, which includes the format of the records exchanged with the kernel, is private. Thus, reusing any of this infrastructure for even your simple example would be "using a sledgehammer to crack a nut".

A further consideration is that you'll be adding code that will itself need to be debugged and maintained. If your code is sufficiently complex that it warrants its own instrumentation then it seems plausible that, one day, you will want to consider calloc(), realloc() and mmap(). Perhaps you will also want to explicitly include or exclude calls to these functions from not just your own code but other libraries against which it is linked.

Finally, it will almost always be preferable to separate the code that implements your actual task from the code used to debug it. One example approach would be to write your own, instrumented wrapper for malloc() and put it in a shared object that you can interpose between your executable and, presumably, libc.

That does not answer the question at all. I do not need any of that complexity. I only need to track a single function, in a single way. And it is much more painful for me to attack this simple problem with a versatile sophisticated tool that `dtrace` is. I'm sure there is a saying exactly for this kind of situation ... something like "killing fleas with thermonuclear weapons". — the swine, Aug 10 '18 at 02:03
@theswine I've edited my original answer to provide some context. — Robert Harris, Aug 10 '18 at 12:46
Thanks. Ok, that _is_ complicated. Did not know it was so low-level. I don't do so many allocations so speed is not such a big deal. I'll try going the way of linking to my own `malloc` and friends. I originally did not want to do that because there are dynamic libraries involved and I was not sure if `malloc` in those libraries would link to my implementation as well. — the swine, Aug 13 '18 at 16:12

dmakarov · Answer 2 · 2018-08-10T07:21:44.787

The pid provider uses a mechanism similar to debugger breakpoints. Dtrace attaches to the process, as a debugger would do. It finds the address of the first instruction of the malloc function in your case, and instruments it inserting a trap instruction at the entry point. Whenever malloc is called the trap instruction triggers control transfer to the dtrace process, which saves the value of the first argument to malloc in its data structures for later aggregations, finding the value of the argument according to the ABI, most likely, in a specific register of the controlled process state. Dtrace restores the original opcode of the instruction that was replaced by a trap instruction at the entry to malloc, makes the controlled process (your application) single step over that instruction, replaces it with the trap again and lets the controlled process continue running.

As for your followup question "how to do this programmatically". This is not related to dtrace, but you might have a look at BDW garbage collector for C and C++ and use it as a leak detector, or simply as a means to gather information about memory allocations your application does http://www.hboehm.info/gc/leak.html . Ultimately you could implement a similar and simplified approach in your code, but that may turn out to be more tedious and complicated than using an existing library.

How does `dtrace` probe memory allocations (Mac OS)

2 Answers2