6

I have a php script that uses Doctrine2 and Zend to calculate some things from a database and send some emails for 30.000 users.

My script is leaking memory and I want to know which are the objects that are consuming that memory, and if it is possible who is keeping a reference to them (thus not allowing them to be released).

Im using php 5.3.x, so plain circular references shouldn't be the problem.

Ive tried using xdebug trace capabilities to get mem_delta with no success (too much data).

Ive tried manually adding memory_get_usage before and after the important functions. But the only conclusion that I got was that I loose around 400k per user, and 3000 users times that gives me the 1Gb that i have available.

Are there any other ways to know where and why memory is leaking? Thanks

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Hernan Rajchert
  • 896
  • 7
  • 19
  • 1
    Well, the users should be processed one after another, there should be only 400k of memory needed! If every cycle increases the memory usage, something in your design is seriously wrong! – markus Oct 05 '11 at 23:37
  • Well, I have a loop that calls a function that does the following: Get the info for the user, calculate (with storing included), send mail, release resources. And each user is independent from each other, so aparently the resources are not being released – Hernan Rajchert Oct 05 '11 at 23:51
  • Did you had a look at doctrine's entitymanager? I'm not very familiar with doctrine but it could possibly store references to entities/proxies/... for all 30k users. – Fge Oct 06 '11 at 00:14
  • Did you try the xdebug profiler? It should give you a good idea of what method is using up the most memory. http://xdebug.org/docs/profiler – Joey Rivera Oct 06 '11 at 00:54
  • @Fge I do a clear of the entity manager after each user is calculated, so as far as I can see, it should be removed – Hernan Rajchert Oct 06 '11 at 01:16
  • @JoeyRivera Yes I've tried, ive told so in the post, the problem is that there is too much information about something that doesnt really help. I need info about the objects, not the methods, and you couldnt belive the amount of methods a single query requires :P – Hernan Rajchert Oct 06 '11 at 01:18
  • Have you just simply stepped through with the debugger? I mean that should show you after one user, what's wrong, shouldn't it? – markus Oct 06 '11 at 06:01

2 Answers2

2

You could try sending say 10 emails and then inserting this

get_defined_vars();

http://nz.php.net/manual/en/function.get-defined-vars.php

At the end of the script or after the email is sent (depending on how your code is setup).

This should tell you what is still loaded, and what you can unset / turn into a reference.

Also if there are two many things loaded you get this near start and end of your code and work out the difference.

markus
  • 40,136
  • 23
  • 97
  • 142
P4ul
  • 770
  • 5
  • 15
  • Thanks, that seems helpful. I'll try that in my loop. From the documentation, the only thing that worries me is that it only gives me info about the object in scope. And I'm guessing the memory problem exits because it is out of scope. – Hernan Rajchert Oct 05 '11 at 23:45
2

30.000 objects to hydrate is quite a lot. Doctrine 2 is stable, but there are some bugs, so I am not too surprised about your memory leak problems.

Although with smaller data sets I had some good success using doctrines batch processing capabilities and creating an iterable result.

You can use the code from the examples, and add a gc_collect_cycles() after each iteration. You have to test it, but for me batch sizes around 100 or so worked quite good – that number gave a good balance between performance and memory usage.

It´s quite important that the script recognizes which entities where processed so that it can be restarted without any problems and resume normal operation without sending emails twice.

$batchSize = 20;
$i = 0;
$q = $em->createQuery('select u from MyProject\Model\User u');
$iterableResult = $q->iterate();
while (($row = $iterableResult->next()) !== false) {
    $entity = $row[0];

    // do stuff with $entity here
    // mark entity as processed

    if (($i % $batchSize) == 0) {
        $em->flush(); 
        $em->clear();

        gc_collect_cycles();
    }
    ++$i;
}

Anyhow, maybe you should rethink your architecture for that script a bit, as a ORM is not well suited for processing large chunks of data. Maybe you can get away with working on the raw SQL rows?

Max
  • 15,693
  • 14
  • 81
  • 131