11

This is a simple programming question, coming from my lack of knowledge of how PHP handles array copying and unsetting during a foreach loop. It's like this, I have an array that comes to me from an outside source formatted in a way I want to change. A simple example would be:

$myData = array('Key1' => array('value1', 'value2'));

But what I want would be something like:

$myData = array([0] => array('MyKey' => array('Key1' => array('value1', 'value2'))));

So I take the first $myData and format it like the second $myData. I'm totally fine with my formatting algorithm. My question lies in finding a way to conserve memory since these arrays might get a little unwieldy. So, during my foreach loop I copy the current array value(s) into the new format, then I unset the value I'm working with from the original array. E.g.:

$formattedData = array();
foreach ($myData as $key => $val) {
    // do some formatting here, copy to $reformattedVal

    $formattedData[] = $reformattedVal;

    unset($myData[$key]);
}

Is the call to unset() a good idea here? I.e., does it conserve memory since I have copied the data and no longer need the original value? Or, does PHP automatically garbage collect the data since I don't reference it in any subsequent code?

The code runs fine, and so far my datasets have been too negligible in size to test for performance differences. I just don't know if I'm setting myself up for some weird bugs or CPU hits later on.

Thanks for any insights.
-sR

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
Soulriser
  • 419
  • 8
  • 16
  • Unless your data strucutes are absolutly huge (a large fraction of your RAM) then you are worrying about nothing. If php runs out a menory it will tell you, and you can increase it in php.ini. – Ian Jan 12 '11 at 21:24
  • 4
    It's a *silly idea*. You have just introduced a side-effect that might be forgotten later for some *micro-optimization* :-/ And no, PHP (nor any other standard GC language I know of) is be able to make the data *contained* in a data-structure available for reclamation while a reference to the *container* exists (this excludes notions like soft/weak references). The `unset` can/will cause the PHP GC to kick in, but the actual performance gained -- if any -- due to released memory pressure is not trivial to generalize. If this *becomes* a problem, *then* address it. –  Jan 12 '11 at 21:27
  • what is the size of this array? – Your Common Sense Jan 12 '11 at 21:38
  • 1
    Thanks for the responses. I was wondering if I was micro-optimizing for no good reason, so I appreciate being called out on silliness. – Soulriser Jan 12 '11 at 21:40

5 Answers5

5

I was running out of memory while processing lines of a text (xml) file within a loop. For anyone with a similar situation, this worked for me:

while($data = array_pop($xml_data)){
     //process $data
}
Amos
  • 161
  • 1
  • 8
4

Use a reference to the variable in the foreach loop using the & operator. This avoids making a copy of the array in memory for foreach to iterate over.

edit: as pointed out by Artefacto unsetting the variable only decreases the number of references to the original variable, so the memory saved is only on pointers rather than the value of the variable. Bizarrely using a reference actually increases the total memory usage as presumably the value is copied to a new memory location instead of being referenced.

Unless the array is referenced, foreach operates on a copy of the specified array and not the array itself. foreach has some side effects on the array pointer. Don't rely on the array pointer during or after the foreach without resetting it.

Use memory_get_usage() to identify how much memory you are using.

There is a good write up on memory usage and allocation here.

This is useful test code to see memory allocation - try uncommenting the commented lines to see total memory usage in different scenarios.

echo memory_get_usage() . PHP_EOL;
$test = $testCopy = array();
$i = 0;
while ($i++ < 100000) {
    $test[] = $i;
}
echo memory_get_usage() . PHP_EOL;
foreach ($test as $k => $v) {
//foreach ($test as $k => &$v) {
    $testCopy[$k] = $v;
    //unset($test[$k]);
}
echo memory_get_usage() . PHP_EOL;
Community
  • 1
  • 1
Andy
  • 17,423
  • 9
  • 52
  • 69
  • Thanks for the reply and useful information. Using your code example, I'm seeing about a 5MB diff in memory usage when using `unset()`. Also, memory usage goes *up* when referencing the array in the foreach (while not using `unset()`). Interesting...though enough time spent on it. – Soulriser Jan 13 '11 at 19:41
3

Please remember the rules of Optimization Club:

  1. The first rule of Optimization Club is, you do not Optimize.
  2. The second rule of Optimization Club is, you do not Optimize without measuring.
  3. If your app is running faster than the underlying transport protocol, the optimization is over.
  4. One factor at a time.
  5. No marketroids, no marketroid schedules.
  6. Testing will go on as long as it has to.
  7. If this is your first night at Optimization Club, you have to write a test case.

Rules #1 and #2 are especially relevant here. Unless you know that you need to optimize, and unless you have measured that need to optimize, then don't do it. Adding the unset will add a run-time hit and will make future programmers why you are doing it.

Leave it alone.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • 2
    "Marketroid" means someone from the Marketing department. In the larger sense, don't let someone non-technical dictate terms to you about what your program should be able to do. – Andy Lester Jan 13 '11 at 15:10
2

If at any point in the "formatting" you do something like:

$reformattedVal['a']['b'] = $myData[$key];

Then doing unset($myData[$key]); is irrelevant memory-wise because you are only decreasing the reference count of the variable, which now exists in two places (inside $myData[$key] and $reformattedVal['a']['b']). Actually, you save the memory of indexing the variable inside the original array, but that's almost nothing.

Artefacto
  • 96,375
  • 17
  • 202
  • 225
  • This isn't correct - by default variables are not passed by reference, only objects are – Andy Jan 12 '11 at 21:26
  • 1
    @Andy First no one's passing anything (do you see any function?), second, in an assignment of `$a = $b` in normal situations no memory is copied between the two variables (PHP implements copy-on-write), even if it behaves as if memory had been copied. – Artefacto Jan 12 '11 at 21:28
  • My mistake, I intended assignment rather than passing of parameters. I have added test code to my answer to demonstrate the memory saved by using `unset()`. – Andy Jan 12 '11 at 21:39
  • @Andy I don't claim you don't save memory. But the memory you save if the original is copied to a subarray of the final data is not the memory taken by the (possible huge) variable, it's the memory taken to index it in the original array (only a few bytes). – Artefacto Jan 12 '11 at 21:43
  • I see your point, and (as an afterthought!) with Apache not releasing memory during script usage unsetting becomes moot. In that case only using a reference is beneficial, I'll update my answer. – Andy Jan 12 '11 at 21:48
0

Unless you're accessing the element by reference unsetting will do nothing whatsoever, as you can't alter the array during within the iterator.

That said, it's generally considered bad practice to modify the collection you're iterating over - a better approach would be to break down the source array into smaller chunks (by only loading a portion of the source data at a time) and process these, unsetting each entire array "chunk" as you go.

John Parker
  • 54,048
  • 11
  • 129
  • 129
  • "unsetting will do nothing whatsoever" - this isn't correct, his code will unset the variable from the original array – Andy Jan 12 '11 at 21:23
  • @Andy I clearly stated it won't do anything if it's **not accessed by reference**. From the PHP manual - "Unless the array is referenced, foreach operates on a copy of the specified array and not the array itself." – John Parker Jan 12 '11 at 21:25
  • Correct, but you'll notice his code is unsetting the variable from the original array, not the copy. – Andy Jan 12 '11 at 21:27
  • 2
    @Andy What you're not getting is that the original and the copy share memory, unsetting from the original will not free the memory because the copy still holds a reference. – Artefacto Jan 12 '11 at 21:29