EDIT: I figured this out. See answer at the bottom of this post.
I'm using PHP's array_udiff to compare two arrays of objects. My goal is to get a list of objects which exist only in the first array but not the second. array_udiff seems like a good choice for the task based on this description:
array_udiff(array $array, array ...$arrays, callable $value_compare_func): array
Returns an array containing all the values of $array that are not present in any of the other arguments.
In the process of working on this, I've stumbled on what seems to be a weird behavior of array_udiff. The documentation states that:
The callback comparison function must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second.
Nowhere in the documentation does it say anything to suggest that it behaves differently based on whether the callback's return value is positive vs. negative. However, that appears to be exactly what is happening.
It appears that a negative value causes it to behave as documented, returning only items which exist in the first array but not the second (unidirectional). On the other hand, a positive value apparently causes it to return all values which are present in one array but not the other (bidirectional).
Following is a test case. You can load this into a text editor and run it using php -q ./arraytest.php
to see it for yourself.
Is this a bug in PHP, expected behavior, or something that is missing from the documentation? Can I depend on it continuing to behave this way, or is that risky? Or, is it actually documented somewhere and I am just missing it?
FWIW I'm using PHP 7.4.3.
arraytest.php
<?php
$old = [
(object) [
'id' => 1
],
(object) [
'id' => 3
]
];
$new = [
(object) [
'id' => 1
],
(object) [
'id' => 5
]
];
$result1 = array_udiff($old, $new,
function ($obj_a, $obj_b) {
if (serialize($obj_a) === serialize($obj_b)) {
return 0;
} else {
return -1;
}
}
);
print_r($result1);
/*
Outputs:
Array
(
[1] => stdClass Object
(
[id] => 3
)
)
*/
$result2 = array_udiff($old, $new,
function ($obj_a, $obj_b) {
if (serialize($obj_a) === serialize($obj_b)) {
return 0;
} else {
return 1;
}
}
);
print_r($result2);
/*
Outputs:
Array
(
[0] => stdClass Object
(
[id] => 1
)
[1] => stdClass Object
(
[id] => 3
)
)
*/
?>
EDIT: I figured this out.
Naturally I found the answer to this after spending 30 minutes writing this post :-) From a different Stack Overflow question:
The problem is that array_udiff is not performing the comparison between all values, and this seems to be caused by your compare function.
array_udiff() expects that the callable function is a real compare function, but you are returning always 0 and -1, but never 1.
Before doing its job, array_udiff() tries to order both arrays and remove duplicates too. If it can't rely on your comparison function, it can't perform all the needed comparison and some values are "jumped".
There is more detail at the above URL. I changed my code to the following, and now array_udiff behaves as documented. All is right in the world.
$result = array_udiff($old, $new,
function ($obj_a, $obj_b) {
return serialize($obj_a) <=> serialize($obj_b);
}
);