2

Scenario: - A small number of PHP projects (e.g. websites) using APCu. Each identified by a unique id / hash, which could be e.g. 20 characters long. We call this $site_hash below. - Each project stores a large number of small values stored in APCu, identified by keys.

Usually one would distinguish the entries by using cache keys like this:

$value = apcu_fetch($site_hash . '|' . $key);

But one might do this one instead:

$value = apcu_fetch($key . '|' . $site_hash);

One could think that the second one is faster, because like this, a hash table lookup often only needs to look at the first few characters.

Can someone confirm this hypothesis?

(I am sure I could run this experiment myself. If I do, I will share it here.)

donquixote
  • 4,877
  • 3
  • 31
  • 54
  • 1
    Well, you're not wrong. On the other hand you're literally talking about milliseconds here. Unless you're doing something *really* precise, honestly, whichever will do. – Andrei Sep 26 '16 at 11:46
  • It is something that will be called maybe 300 times per request. A cache for a class loader. So if I can save 2 milliseconds, I will. – donquixote Sep 26 '16 at 11:55
  • Fair enough. @LeCintas does make a good point tho, average look-up time will still be O(n) no matter how you place those keys or what values they have. – Andrei Sep 26 '16 at 12:45
  • Why O(n)? Assuming n = number of items, it would be O(n) if it had to go through all elements until it finds a match. It is ~O(log n) if the elements are stored in a lookup tree, which is how such hash tables should be implemented. But the length of the string also plays a role. Let's say m is the length of the string. Then it could be O(m + log n), or O(m * log n), depending how it is implemented. And maybe m is replaced with k, with k <= m, which would be the substring that is being looked at. So it depends a lot on implementation. – donquixote Sep 26 '16 at 13:38
  • So the only complete answer would be a benchmark. – donquixote Sep 26 '16 at 13:39
  • Or someone who is familiar with the implementation in C. – donquixote Sep 26 '16 at 13:39
  • That's...a fair point actually. It's not O(n), my bad. – Andrei Sep 26 '16 at 13:41

2 Answers2

0

Even if the function uses a hash table, both of your methods can be faster than the other one. I explain:

If the $site_hash is used before $key, then the speed depends on the ASCII value of the firsts characters (if the string starts with 'z', it will be slower than if it starts with 'a').

And the problem will be the same if it starts with $key.

Victor Castro
  • 1,232
  • 21
  • 40
  • The question then would be the average lookup time. It does not help if I look up 300 key, if one of them is super fast.. – donquixote Sep 26 '16 at 12:40
0

I did run a benchmark.

<?php

function apcutest($prepend = FALSE) {

  apcu_clear_cache();

  $prefix = $suffix = __FILE__ . __FILE__ . __FILE__;

  $keys = [];
  for ($i = 0; $i < 100000; ++$i) {
    apcu_store(
      $keys[] = $prepend
        ? $prefix . $i
        : $i . $suffix,
      md5("($i)"));
  }

  $t0 = microtime(TRUE);

  foreach ($keys as $key) {

    apcu_fetch($key);
    apcu_fetch($key);
    apcu_fetch($key);
    apcu_fetch($key);
    apcu_fetch($key);

    apcu_fetch($key);
    apcu_fetch($key);
    apcu_fetch($key);
    apcu_fetch($key);
    apcu_fetch($key);
  }

  $t1 = microtime(TRUE);

  return ($t1 - $t0) * 1000;
}

$dts = [];
$dts[] = apcutest(FALSE);
$dts[] = apcutest(TRUE);
$dts[] = apcutest(FALSE);
$dts[] = apcutest(TRUE);
$dts[] = apcutest(FALSE);
$dts[] = apcutest(TRUE);

print_r($dts);

Result on my machine:

Array
(
    [0] => 415.98796844482
    [1] => 413.39302062988
    [2] => 414.03603553772
    [3] => 415.08793830872
    [4] => 413.25092315674
    [5] => 414.61896896362
)

Observation: For a few runs there seemed to be a very small but consistent advantage for the version with suffix. However, subsequent runs did not confirm this. For this experiment, there is no statistically significant measurable difference between the two.

Conclusion: Based on this experiment, it does not matter whether to use a prefix or a suffix. This may not be the end-all answer, but it is the answer I can give at this time. I wonder how this lookup is implemented.

donquixote
  • 4,877
  • 3
  • 31
  • 54