5

Today while working on text analysing tool for blogs, I found PHP behavior very strange for me and just couldn't wrap my head around it. While normalizing text, I was trying to remove words below minimum length, so I wrote this in my normalization method:

if ($this->minimumLength > 1) {
    foreach ($string as &$word)
    {
        if (strlen($word) < $this->minimumLength) {
            unset($word);
        }
    }
}

Strangely, this would leave some words below allowed length in my array. After searching my whole class for mistakes, I gave a shot at this:

if ($this->minimumLength > 1) {
        foreach ($string as $key => $word)
        {
            if (strlen($word) < $this->minimumLength) {
                unset($string[$key]);
            }
        }
    }

And voila! This worked perfectly. Now, why would this happen ? I checked out PHP Documentation and it states:

If a variable that is PASSED BY REFERENCE is unset() inside of a function, only the local variable is destroyed. The variable in the calling environment will retain the same value as before unset() was called.

Does foreach act here as a calling environment because it has it's own scope?

  • Never modify something you're iterating over, because of this sort of unexpected behavior. – Waleed Khan Jan 12 '13 at 19:02
  • 2
    Not answering your question about references, but the easiest and cleanest way to do what you want is using the array_filter() function - http://www.php.net/manual/en/function.array-filter.php – Mark Baker Jan 12 '13 at 19:09
  • Thanks for this, I have used array_filter earlier but for some different purposes. I wouldn't think of it for such a "simple" action as removing elements from an array but seems like setting them to `false` and running array_filter over it really is the clearest way to go –  Jan 12 '13 at 20:55

2 Answers2

2

No, there is no function call here and no variable is being passed by reference (you are simply capturing by reference during the iteration).

When you iterate by reference the iteration variable is an alias to the original. When you use this alias to refer to the original and modify its value the change will remain visible in the array being iterated.

However, when you unset the alias the original variable is not "destroyed"; the alias is simply removed from the symbol table.

foreach ($string as $key => &$word)
{
    // This does not mean that the word is removed from $string
    unset($word);

    // It simply means that you cannot refer to the iteration variable using
    // $word from this point on. If you have captured the key then you can
    // still refer to it with $string[$key]; otherwise, you have lost all handles
    // to it for the remainder of the loop body
}
Jon
  • 428,835
  • 81
  • 738
  • 806
  • Oh....somehow I always thought that all actions against value's reference will just be mirrored onto the value itself. Guess I was wrong, thanks for clearing that up. –  Jan 12 '13 at 20:51
  • 1
    @igorpan: A related tip is that when you iterate with reference you may want to `unset` the loop variable immediately after the loop body; otherwise you "risk" assigning to that variable which will overwrite the last value from the iteration. – Jon Jan 12 '13 at 20:55
  • I knew about that, loop variable stays defined even after the loop itself is over. Nevertheless, can be useful if somebody stumbles upon this question :) –  Jan 12 '13 at 22:44
1

When you were calling unset($word) inside your if statement, you were removing the $word variable itself, without making any changes to the array $string.

Joe Day
  • 6,965
  • 4
  • 25
  • 26
  • But why is that so when I passed it by reference? My foreach states `foreach ($string as &$word)` –  Jan 12 '13 at 19:01
  • Because it is a __reference__ to the original variable, not the actual original variable; you're unsetting the reference again, but the original remains – Mark Baker Jan 12 '13 at 19:02