2

When ref-assigning an array's element, the contents of the array are modified:

$arr = array(100, 200);
var_dump($arr);
/* shows:
array(2) {
  [0]=>
  int(100)  // ← ← ← int(100)
  [1]=>
  int(200)
}
*/

$r = &$arr[0];
var_dump($arr);
/* shows:
array(2) {
  [0]=>
  &int(100)  // ← ← ← &int(100)
  [1]=>
  int(200)
}
*/

Live run. (Zend Engine will do fine, while HHVM shows "Process exited with code 153".)

Why is the element modified?

Why do we see &int(100) instead of int(100)?

This seems totally bizarre. What's the explanation for this oddity?

Pacerier
  • 86,231
  • 106
  • 366
  • 634

1 Answers1

7

I have answered this a while back, but cannot find the answer right now. I believe it went like this:

References are simply "additional" entries in the symbol table for the same value. The symbol table can only have values it points to, not values in values. The symbol table cannot point to an index in an array, it can only point to a value. So when you want to make a reference to an array index, the value at that index is taken out of the array, a symbol is created for it and the slot in the array gets a reference to the value:

$foo = array('bar');

symbol | value
-------+----------------
foo    | array(0 => bar)

$baz =& $foo[0];

symbol | value
-------+----------------
foo    | array(0 => $1)
baz    | $1
$1     | bar              <-- pseudo entry for value that can be referenced

Because this is not possible:

symbol | value
-------+----------------
foo    | array(0 => bar)
baz    | &foo[0]          <-- not supported by symbol table

The $1 above is just an arbitrarily chosen "pseudo" name, it has nothing to do with actual PHP syntax or with how the value is actually referenced internally.

As requested in the comments, here how the symbol table usually behaves with references:

$a = 1;

symbol | value
-------+----------------
a      | 1


$b = 1;

symbol | value
-------+----------------
a      | 1
b      | 1


$c =& a;

symbol | value
-------+----------------
a, c   | 1
b      | 1
deceze
  • 510,633
  • 85
  • 743
  • 889
  • @deceze, the *characters* you use in the symbol-value table is confusing... How would you draw the symbol-value table after this line `$a = 1; $b = 1; $c =& $a;`? (simply need it as a reference to properly understand what you mean here) – Pacerier Aug 07 '13 at 19:09
  • @deceze. Ic, so after the line `$baz =& $foo[0];`, instead of `baz | $1` `$1 | 'bar'`, you actually meant that we get `baz, $1 | 'bar'` right? – Pacerier Aug 07 '13 at 23:55
  • @Pacerier That's another way to look at it. I don't know whether it's more correct to say that `$baz` refers to the value and `$foo[0]` is a pseudo link which also refers to the same value; or whether both `$baz` and `$foo[0]` refer to a pseudo link which refers to the value. But yeah, you get the idea. – deceze Aug 08 '13 at 05:50
  • @deceze, hmm, I don't quite get what you mean by a *"pseudo link"*.. Isn't it just `$baz` and `$foo[0]` being two different symbols pointing to the same zval container [type='string', value='baz', refcount=2, is_ref=true]? – Pacerier Aug 08 '13 at 06:13
  • @Pacerier Almost. `$foo[0]` cannot be a *symbol*. `foo` is a symbol which holds an array. Index `[0]` of that array is a reference to an entry in the symbol table which holds your mentioned zval, with `bar` also referring to that zval (whether directly or indirectly). – deceze Aug 08 '13 at 06:30
  • @deceze, I think you're wrong here because arrays themselves have their own symbol tables separate from the global symbol table http://derickrethans.nl/talks/phparch-php-variables-article.pdf , so `[0]` is a symbol in the array's symbol table pointing to the zval container 'baz', and `bar` being the symbol in the global symbol table pointing to that same zval container. – Pacerier Aug 08 '13 at 07:09
  • @Pacerier That's basically what I said, just adding that arrays use a symbol table internally. That's sort of an irrelevant detail though. The point is that `$foo[0]` cannot be an entry in the "global" symbol table as is, so there's an intermediate value being introduced that `$foo[0]` points to instead. – deceze Aug 08 '13 at 07:12