1

I have a long-running PHP script that seems to have a memory leak and that got me diving into how PHP garbage collection works. I had some questions about it and maybe there are some people on here who know enough about the innards to answer them.

First off, I'm wondering which specific variables end up in the root buffer. Is it only top-level items or in an array-of-arrays (like in a query result, for example) will each element in that top-level array get stuck in there?

To put in code:

$a = [  ['b'=>1234, 'c'=>2345] ];

Is only "a" in the root buffer or does $a[0] end up in there too?

What happens if you end up with more than 10K roots? Does it stop collecting at some point?

Finally, should I be concerned about 0% efficiency collections in the xdebug garbage collection report?

miken32
  • 42,008
  • 16
  • 111
  • 154
rbalik
  • 93
  • 5
  • 2
    Just FYI I have a long running (months to years) process written in PHP running as a daemon, and I have yet to see PHP leak memory. If you wish to figure out why your script is leaking please publish it, or at the very least give an overview of what it is doing. – Geoffrey Mar 18 '19 at 21:39
  • 2
    Oh I'm sure it's my fault and not PHP. It's pretty complex and proprietary so unfortunately I can't do that. I'm not really asking people to debug my script anyway, was more just curious about this root buffer stuff in case it's important to my investigation. – rbalik Mar 18 '19 at 21:47
  • 1
    Reaching the 10K limit triggers a garbage collection run -- it will not stop collection (unless you disabled GC). – NikiC Mar 22 '19 at 16:48
  • What happens if it is still above 10K after the collection? How often does it keep retrying? – rbalik Mar 22 '19 at 20:08
  • @NikiC The 10k limit is no longer a thing -- PHP sets this limit on-the-fly using some heuristics. – JS_Riddler Apr 01 '20 at 15:57

1 Answers1

1

As noted in the garbage collection docs, which are terribly out of date and hard to trust:

The whole reason for implementing the garbage collection mechanism is to reduce memory usage by cleaning up circular-referenced variables as soon as the prerequisites are fulfilled. In PHP's implementation, this happens as soon as the root-buffer is full, or when the function gc_collect_cycles is called.

Even if your code has no memory leaks, you can still run out of memory. It's ultimately a question of what happens first:

  • GC is enabled (default) and the threshold* of root objects is met, or gc_collect_cycles is called. PHP will collect unreferenced objects and free up memory. Your script will happily run along.
  • Your script runs out of memory.

[*] I am uncertain as to how the threshold is determined -- it appears to be magically set and can change as your script executes. You can call gc_status() to see the current threshold, and how many roots there are.

If you are encountering out of memory problems because memory runs out before the threshold is met, I do not know how to fix this besides manually calling gc_collect_cycles(). Hopefully somebody can shed some light on this!


Now, your questions:


Is only "a" in the root buffer or does $a[0] end up in there too?

From the reference counting basics docs (search for Compound Types), only $a will be added to the root buffer. This is because it is a compound type (array, or object). In the diagram in the docs, only what is within the dotted line is in the root buffer -- in this case, a single reference to the "a" object.

See below:

Example #5 Creating a array zval

<?php
$a = array( 'meaning' => 'life', 'number' => 42 );
xdebug_debug_zval( 'a' );
?>

The above example will output something similar to:

a: (refcount=1, is_ref=0)=array (
   'meaning' => (refcount=1, is_ref=0)='life',
   'number' => (refcount=1, is_ref=0)=42
)

Diagram from PHP docs


What happens if you end up with more than 10K roots? Does it stop collecting at some point?

As soon as it reaches the threshold, PHP should do garbage collection.

Although these docs do mention the 10K roots limit, I do not believe that is accurate for PHP 7.3+. As noted before, you can run gc_status() to view the current threshold at any time. Even the docs for gc_status show an example threshold of 50000, and not 10000.

I was able to find a reddit post that confirms PHP runs heuristics to determine the threshold. It is no longer fixed at 10000.

There is also a bug report that suggests changing the docs.


Finally, should I be concerned about 0% efficiency collections in the xdebug garbage collection report?

From xdebug docs:

Efficiency% — Is the number of cleared roots divided by 10000 - a magic number of "roots" when reached triggers PHPs internal garbage collector to run automatically.

I don't use xdebug myself, so I don't know if it's been changed to divide by the threshold at the current time (If it still uses 10000, it would be possible to get >100% efficiency.)

But, in short, 0% efficiency would mean that all of the objects/arrays you've created are being referenced by something that is ultimate referenced by the root level scope. As far as I know, PHP can resolve circular references and clean them up, so this is very likely a memory leak.

JS_Riddler
  • 1,443
  • 2
  • 22
  • 32