Can I use dtrace on OS X 10.5 to determine which of my perl subs is causing the most memory allocation?

Question

We have a pretty big perl codebase.

Some processes that run for multiple hours (ETL jobs) suddenly started consuming a lot more RAM than usual. Analysis of the changes in the relevant release is a slow and frustrating process. I am hoping to identify the culprit using more automated analysis.

Our live environment is perl 5.14 on Debian squeeze.

I have access to lots of OS X 10.5 machines, though. Dtrace and perl seem to play together nicely on this platform. Seems that using dtrace on linux requires a boot more work. I am hoping that memory allocation patterns will be similar between our live system and a dev OS X system - or at least similar enough to help me find the origin of this new memory use.

This slide deck:

https://dgl.cx/2011/01/dtrace-and-perl

shows how to use dtrace do show number of calls to malloc by perl sub. I am interested in tracking the total amount of memory that perl allocates while executing each sub over the lifetime of a process.

Any ideas on how this can be done?

This >might< be what I'm looking for. Could use advice from perl/dtrace heavies on suitability for purpose. - https://github.com/astletron/perl-dtrace-malloc/blob/master/perl-malloc-total-bytes-by-sub.d — astletron, Jan 03 '12 at 18:45
Hrm. That's doing >something< but it seems to be very slow. Maybe the overhead of intercepting all these sub entries/exists and calls to malloc is too heavy to be practical? — astletron, Jan 03 '12 at 19:36
I think I may have this working. There was a problem in the way I was running dtrace, not in my D program. I think the output I now have is the count of bytes requested by malloc broken down by the name of the sub and the file the sub is in where that sub was the last per sub entered before the malloc. This is not entirely straightforward, but looks to be directionally useful. Could still use input from any dtrace ninjas who happen to be walk by to validate that my work is right here. — astletron, Jan 04 '12 at 23:47

score 4 · Answer 1 · edited Jan 09 '12 at 09:54

There's no single way to do this, and doing it on a sub-by-sub basis isn't always the best way to examine memory usage. I'm going to recommend a set of tools that you can use, some work on the program as a whole, others allow you to examine a single section of your code or a single variable.

You might want to consider using Valgrind. There's even a Perl module called Test::Valgrind that will help set up a suppression file for your Perl build, and then check for memory leaks or errors in your script.

There's also Devel::Size which does exactly what you asked for, but on a variable-by-variable basis rather than a sub-by-sub basis.

You can use Devel::Cycle to search for inadvertent circular memory references in complex data structures. While a circular reference doesn't mean that you're wasting memory as you use the object, circular references prevent anything in the chain from being freed until the cycle is broken.

Devel::Leak is a little bit more arcane than the rest, but it basically will allow you to get full information on any SVs that are created and not destroyed between two points in your program's execution. If you check this across a sub call, you'll know any new memory that that subroutine allocated.

You may also want to read the perldebguts section of the Perl manual.

I can't really help more because every codebase is going to wind up being different. Test::Valgrind will work great for some codebases and terribly on others. If you are going to try it, I recommend you use the latest version of Valgrind available and Perl >= 5.10, as Perl 5.8 and Valgrind historically didn't get along too well.

Hi @Dan. Thanks for the quickie response. I don't think I have a memory leak or cyclic references. I think it's likely that someone has declared a hashef at package scope and it is getting filled up slowly as the process ticks on (possibly as a poorly-implemented 'cache' with no max size). Looking for an approach that will let me determine what code of mine is causing perl to ask for more RAM. My ideal output would be a list of fully-qualified sub names with the amount of RAM allocated as a side-effect of their execution. I think I may be getting close to this output with dtrace. — astletron, Jan 03 '12 at 18:02

Ranguard · Answer 2 · 2012-01-04T13:06:54.610

2

You might want to look at Memory::Usage and Devel::Size

To check the whole process or sub:

use Memory::Usage;
my $mu = Memory::Usage->new();

# Record amount of memory used by current process
$mu->record('starting work');

# Do the thing you want to measure
$object->something_memory_intensive();

# Record amount in use afterwards
$mu->record('after something_memory_intensive()');

# Spit out a report
$mu->dump();

Or to check specific variables:

use Devel::Size qw(size total_size);

my $size = size("A string");

my @foo = (1, 2, 3, 4, 5);
my $other_size = size(\@foo);

my $foo = {
     a => [1, 2, 3],
     b => {a => [1, 3, 4]}
};
my $total_size = total_size($foo);

edited Jan 04 '12 at 13:06

answered Jan 04 '12 at 12:46

Ranguard

2,756
3
20
15

Heya @Ranguard. I was hoping to avoid a solution which requires me to add more instrumentation to the code. Easy to get caught in a code/run/analyze/code loop that way. Maybe I'm fighting windmills, but I am hoping to use dtrace as the NYTProf of perl memory-consumption - something that I can run to analyze my code w/o having to alter the code. – astletron Jan 04 '12 at 23:52

score 2 · Accepted Answer · answered Feb 29 '12 at 09:33

2

The answer to the question is 'yes'. Dtrace can be used to analyze memory usage in a perl process.

This snippet of code:

https://github.com/astletron/perl-dtrace-malloc/blob/master/perl-malloc-total-bytes-by-sub.d

tracks how memory use increases between the call and return of every sub in a program. As an added bonus, dtrace seems to sort the output for you (at least on OS X). Cool.

Thanks to all that chimed in. I answered this one myself as the question is really specific to dtrace/perl.

answered Feb 29 '12 at 09:33

astletron

1,387
1
14
26

Hi, could you please explain what's happening in your snippet? Where is your main perl code? – user13107 Nov 03 '12 at 17:28
That snippet is written in D, the dtrace scripting language. It records the perl sub call stack in an array. Every time malloc is called, the amount of memory requested is added to the total amount associated with the current perl sub. When the program being dtrace'd ends, the total amount of memory requested via malloc during the execution of each sub is printed out. I think this will work with any perl code (or any code that uses the appropriate dtrace traps). – astletron Nov 20 '12 at 17:06
This is actually a great answer. Don't forget to accept it (even though you answered it yourself)! – Dan Jun 20 '13 at 18:55

score 1 · Answer 4 · answered Jan 04 '12 at 13:39

1

You could write a simple debug module based on Devel::CallTrace that prints the sub entered as well as the current memory size of the current process. (Using /proc or whatever.)

answered Jan 04 '12 at 13:39

Mithaldu

2,393
19
39

Hi @Mithaldu. For this approach, I think I would want to record the change process size from the time a sub is entered to the time the sub is left. Definitely see how I could get 'profiler-like' output (with memory increase replacing execution time) this way. Thanks for hte suggestion. Will give it a go if dtrace-based approach does not prove fruitful. – astletron Jan 04 '12 at 23:50

Can I use dtrace on OS X 10.5 to determine which of my perl subs is causing the most memory allocation?

4 Answers4