6

I've slurped in a big file using File::Slurp but given the size of the file I can see that I must have it in memory twice or perhaps it's getting inflated by being turned into 16 bit unicode. How can I best diagnose that sort of a problem in Perl?

The file I pulled in is 800mb in size and my perl process that's analysing that data has roughly 1.6gb allocated at runtime.

I realise that I may be wrong about my reason for the problem but I'm not sure the most efficient way to prove/disprove my theory.

Update:

I have elminated dodgy character encoding from the list of suspects. It looks like I'm copying the variable at some point, I just can't figure out where.

Update 2:

I have now done some more investigation and discovered that it's actually just getting the data from File::Slurp that's causing the problem. I had a look through the documentation and discovered that I can get it to return a scalar_ref, i.e.

my $data = read_file($file, binmode => ':raw', scalar_ref => 1);

Then I don't get the inflation of my memory. Which makes some sense and is the most logical thing to do when getting the data in my situation.

The information about looking at what variables exist etc. has generally helpful though thanks.

Community
  • 1
  • 1
Colin Newell
  • 3,073
  • 1
  • 22
  • 36
  • This SO post may be helpful: [How can I programmatically determine my Perl program's memory usage under Windows?](http://stackoverflow.com/questions/1115743/how-can-i-programmatically-determine-my-perl-programs-memory-usage-under-windows). – Zaid Jun 09 '10 at 14:49
  • That is generally interesting although it's data at that level that has made me realise I have this bug. – Colin Newell Jun 09 '10 at 14:51
  • Is slurping the entire file necessary to your process? Is a line-by-line analysis not possible? – Mark Canlas Jun 09 '10 at 15:03
  • It's a binary file. I could go through it in a different way but it's a bug in my program that's causing the problem and I'd rather understand that so that I don't make the same mistake again in a more subtle place where it's harder to spot. – Colin Newell Jun 09 '10 at 15:08
  • @Colin => litter a bunch of print statements around your code, and then watch your memory usage, when you see the spike check which print statement you are at. Then, if you can't spot the bug, post that portion of the code up here. Alternatively, you can step through with the debugger. – Eric Strom Jun 09 '10 at 18:48
  • Please reduce your program to a small test case and post that code. – daxim Jun 10 '10 at 13:51

2 Answers2

4

Maybe Devel::DumpSizes and/or Devel::Size can help out? I think the former would be more useful in your case.

Devel::DumpSizes - Dump the name and size in bytes (in increasing order) of variables that are available at a give point in a script.

Devel::Size - Perl extension for finding the memory usage of Perl variables

Htbaa
  • 2,319
  • 18
  • 28
4

Here are some generic resources on memory issues in Perl:

As far as your own suggestion, the simplest way to disprove would be to write a simple Perl program that:

  1. Creates a big (100M) file of plain text, probably by just outputting the same string in a loop into a file, or for binary files running dd command via system() call

  2. Read the file in using standard Perl open()/@a=<>;

  3. Measure memory consumption.

Then repeat #2-#3 for your 800M file.

That will tell you if the issue is File::Slurp, some weird logic in your program, or some specific content in the file (e.g. non-ascii, although I'd be surprized if that ends up to be the reason)

Community
  • 1
  • 1
DVK
  • 126,886
  • 32
  • 213
  • 327
  • I do appear to have eliminated dodgy character encoding. A closer look reveals the process starts out with roughly the same memory footprint as the file then after doing some stuff checking things in the header it doubles up. I just can't see what's causing that. – Colin Newell Jun 09 '10 at 14:59