11

I always thought that in_array strict mode will be faster or at least the same speed as non-strict mode. But after some benchmarks I noticed there is a huge difference in execution time between them while searching for integers. The string and array tests indicate that strict mode is faster. Why?

Test code - (PHP 7.2.1):

<?php

$array = array_fill(0, 10000, 12345);

for ($i=0; $i<100000; $i++) {
    in_array($i, $array, true);
}

time php test.php

php -c test.php 12.98s user 0.04s system 98% cpu 13.234 total


<?php

$array = array_fill(0, 10000, 12345);

for ($i=0; $i<100000; $i++) {
    in_array($i, $array, false);
}

time php test.php

php -c test.php 6.44s user 0.04s system 99% cpu 6.522 total

Tom
  • 369
  • 2
  • 14
  • 4
    According to the docs: `If the third parameter strict is set to TRUE then the in_array() function will also **check the types** of the needle in the haystack.` - So there will be adittional operations (type comparison) – B001ᛦ Jul 03 '19 at 12:53
  • 6
    @B001ᛦ Not sure about the actual implementation in C under the hood but one would imagine that strict checking simply enables `===` instead of `==` and `===` is objectively faster in PHP code. – MonkeyZeus Jul 03 '19 at 12:56
  • Yes, but I always thought that for example if it's set to false and for example we have bool (false) argument it should be checked against int (0), float(0) string ('', '0'), bool(false), array([]), etc. So the strict version should have less work to do. – Tom Jul 03 '19 at 12:57
  • 1
    In your example, all of them match type. What if none of the types match? Try searching for a string instead of an integer? Perhaps in_array checks type first, then if it matches, checks for equality. So if the types match, it takes slightly longer, and if they don't it goes slightly faster. – Stevish Jul 03 '19 at 13:02
  • @MonkeyZeus PHP 7.2.1 NTS x64 (MacBook Pro 2015 15") – Tom Jul 03 '19 at 13:05
  • 3
    For anyone good with C programming, see https://github.com/php/php-src/blob/master/ext/standard/array.c#L1616 and https://github.com/php/php-src/blob/master/ext/standard/array.c#L1529 – MonkeyZeus Jul 03 '19 at 13:06
  • @Stevish I have tested many cases (pasted only two in question)... in all my tests strict mode is more than twice slower than non strict mode – Tom Jul 03 '19 at 13:07
  • Albeit interesting and potentially worthy as a bug report, I would argue that if you need to perform `in_array()` such a large number of times then it would be immensely faster to just do an `array_flip()` and `isset()` assuming all of your array elements are scalar. – MonkeyZeus Jul 03 '19 at 13:13
  • I have reviewed one more time my tests and noticed that I had some logic errors there in conditions making that there was not equal checks between strict and non strict mode tests. After another check the result is that the slowdown is only while searching for integers. I have updated the question. – Tom Jul 03 '19 at 13:41
  • 4
    Fixed in https://github.com/php/php-src/commit/ea86a9209c0cf6e39f364d6a5cf0ed07addbf3f9. – NikiC Jul 05 '19 at 09:22

2 Answers2

7

I can offer some small insight from tracing through the C source for in_array.

It turns out, when comparing integers, the path to reach the actual equality check for non-strict mode involves fewer operations than strict mode.

Strict mode

In the case where the strict flag to in_array is true, the following occurs:

  1. We call fast_is_identical_function for each element in the array

  2. The fast_is_identical_function first tests that the types of each operand are different (Z_TYPE_P(op1) != Z_TYPE_P(op2)) in hopes of being able to return false early; this is comparison #1.

  3. If the types are the same (they are, in your test case), we then test (Z_TYPE_P(op1) <= IS_TRUE; I've no idea what this does, but it's comparison #2.

  4. After both comparisons have evaluated to false, we jump into zend_is_identical, our first function invocation.

  5. zend_is_identical starts out by again testing Z_TYPE_P(op1) != Z_TYPE_P(op2), another attempt to fail early. This is comparison #3.

  6. If the types are the same, we can descend through the switch (Z_TYPE_P(op1)) statement, comparison #4

  7. Finally we reach the comparison Z_LVAL_P(op1) == Z_LVAL_P(op2) which actually tests the equality of the two values, comparison #5.

In total, to test whether each element of the array is equal to the value we're searching for, there are 5 comparisons and 1 function invocation.

Non-strict mode

By comparison, the non-strict flow for integers specifically (really LONGs) is much simpler, as follows:

  1. Instead of fast_is_identical_function, we instead use fast_equal_check_function for each element in the array.

  2. The method fast_equal_check_function starts a much more complicated process of comparing the two values with all kinds of type-casting logic. However, the very first test it does happens to be optimized for integers, as follows:

    if (EXPECTED(Z_TYPE_P(op1) == IS_LONG)) {
        if (EXPECTED(Z_TYPE_P(op2) == IS_LONG)) {
            return Z_LVAL_P(op1) == Z_LVAL_P(op2);
    

    We can see that it...

    1. immediately tests whether type of op1 is LONG, which it is, then
    2. immediately tests whether the type of op2 is LONG, which it is, then
    3. immediately returns the result of Z_LVAL_P(op1) == Z_LVAL_P(op2)

A total of 3 simple equality comparisons and 0 function invocations for the non-strict case, vs at least 5 comparisons and 1 jump for the strict case.

This appears to be a case where attempted early optimization makes the strict check slower (by repeatedly testing the types of the operands in hopes that we can find inequality sooner) than the specific non-strict case of comparing two integers.

user229044
  • 232,980
  • 40
  • 330
  • 338
  • So this might be the place for the optimization in next PHP releases, right? – Tom Jul 04 '19 at 09:32
  • @tom I don't think any of this is really a problem. The code is already optimized pretty heavily, and you would risk introducing some other kind of performance problem for some other type of input by changing it. I think it's better to think of the current strict performance as normal, and the non-strict long-long case as a nice bonus. – user229044 Jul 04 '19 at 11:48
  • yup. You're right. I thought about focusing on the strict mode at the expense of the non-strict mode (if it is needed to improve speed of the strict mode) – Tom Jul 04 '19 at 12:35
  • 1
    @tom this has been "fixed" in https://github.com/php/php-src/commit/ea86a9209c0cf6e39f364d6a5cf0ed07addbf3f9 :) – azjezz Jul 07 '19 at 16:42
1

It seems to have something to do with the type of element in the needle and/or haystack, observe:

PHP 7.3.5 from http://sandbox.onlinephpfunctions.com/

$iterations = 10000000;
$needle = false;
$haystack = [ true ];

$start = microtime( true );
for( $i = 0; $i < $iterations; ++$i )
{
    in_array( $needle, $haystack, true );
}
echo ( microtime( true ) - $start ).' strict'.PHP_EOL;

$start = microtime( true );
for( $i = 0; $i < $iterations; ++$i )
{
    in_array( $needle, $haystack, false );
}
echo ( microtime( true ) - $start ).' not strict';

produces:

0.29996585845947 strict
0.40397191047668 not strict

but if we use:

$needle = 1;
$haystack = [ 2 ];

then we get:

0.34480714797974 strict
0.28275084495544 not strict

However, PHP 5.6.29 produces a negligible discrepancy and running the same test multiple times can put strict ahead of non-strict and vice versa.

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
  • I have noticed the same results after reviewing my tests. The "int" is the only affected one I found. – Tom Jul 03 '19 at 13:45