10

I need a very fast string hashing function, that fits well with web application written in PHP.

The problem I am trying to overcome is assigning IDs to permissions in an access control system. I am thinking about using hashed strings to represent IDs of permissions. This way I will be able to check permissions the way like this:

if ($Auth->isAllowed($user, "blog.comment")) {
    // Do some operation
}
...

if ($Auth->isAllowed($user, "profile.avatar.change")) {
    // Do some other operation
}

The DB table will map permission hashes to user's roles. To check that the user is allowed to do "profile.avatar.change" the corresponding string will be hashed and checked against DB table.

This is very handy and there will be no need to worry about maintaining unique permission IDs among different modules. But the hashing function should be very efficient.

ezpresso
  • 7,896
  • 13
  • 62
  • 94
  • 1
    Hashing is a one way street, so there is nothing you could check in a hash, other than its existence, for something like this. – Jay Blanchard Feb 06 '17 at 17:04
  • the most common way is to follow linux approach. (using 0-7 to represent permissions). Assign ID's to permissions and do 2^(id number) to create an integer, then unroll it in the same way to figure out which permissions you have... Or just pass objects/tokens with a bunch of variables and check $user->can_change_stuff or $user->has_apples – Dimi Feb 06 '17 at 17:06
  • @apokryfos, it is not a duplicate. These all questions are mine. This question is more specific about string hashing. – ezpresso Feb 06 '17 at 17:24
  • @JayBlanchard, that is exactly what I want to check - the existence of some particular permission in a database table. – ezpresso Feb 06 '17 at 17:29
  • 5
    The fastest way to hash a 8-16 byte string to a unique string is to not do anything at all to it. Just store it as is. It's short as it is. – apokryfos Feb 06 '17 at 17:30
  • @apokryfos, I have just checked the source code of YII. It seems like you are right! – ezpresso Feb 06 '17 at 17:41
  • @ezpresso check the answer, please. Otherwise half of your reputation points will disappear. – shukshin.ivan Feb 25 '17 at 10:09

3 Answers3

11

The first though was why don't he use a simple md5 function?.

Trying to write hash by myself

One of the most frequently referred function is a simple hash Bernstein's function also reffered to as Times 33 with Addition. It is used in php by zend to make hashes for keys of associative array. In php it could be implemented as follows:

function djb2($s){
    $word = str_split($s);
    $length = count($word);

    $hashAddress = 5381;
    for ($counter = 0; $counter < $length; $counter++){
        $hashAddress = (($hashAddress << 5) + $hashAddress) + $word[$counter];
    }
    return $hashAddress;
}
echo djb2("stackoverflow");

The problem is that when it is implemented this way, it is rather slow. Tests shows that it is ~3 times slower, than md5. So we have to find the fastest internal implementation of a hash function.

Finding the best internal hash

Just take all algos and measure time to hash a million of strings.

function testing($algo, $str) {
    $start = microtime(true);
    for($ax = 0; $ax < 1000000; $ax++){
        hash($algo, $str);
    }

    $end = microtime(true);
    return ($end - $start);
}


$algos = hash_algos();
$times = [];

foreach($algos as $algo){
    $times[$algo] = testing($algo, "stackoverflow");
}

// sort by time ASC
asort($times);

foreach($times as $algo => $time){
    echo "$algo -> " . round($time, 2)."sec\n";
}

My results was:

fnv1a32 -> 0.29sec
fnv132 -> 0.3sec
crc32b -> 0.3sec
adler32 -> 0.3sec
crc32 -> 0.31sec
joaat -> 0.31sec
fnv1a64 -> 0.31sec
fnv164 -> 0.31sec
md4 -> 0.46sec
md5 -> 0.54sec
...
md2 -> 6.32sec

The result slightly changes from execution to execution - the first 8 algos are shuffling due to their close speeds and its dependency on the server load.

What should be chosen?

You can take any of top-8 functions above: $hash = hash('crc32', $string);. Actually a widely used md5 function is just 1.7 times slower than the leaders.

Bonus

There are another functions like SuperFastHash, that are not implemented in php code, but they are 4x faster than crc32.

shukshin.ivan
  • 11,075
  • 4
  • 53
  • 69
2

The processing time of a hashing function can be considered negligible in most cases. If you need a little hash (8 characters), you can simply use the crc32 function.

<?php
$hash = hash('crc32', 'WhatDoYouWant');
?>

You can also combine hash with uniqid to create random hash.

<?php
$hash = hash('crc32', uniqid());
?>
arnolem
  • 946
  • 1
  • 6
  • 8
2

Use xxHash. It's used by PrestoDB also. PHP implementation on GitHub

SACn
  • 1,862
  • 1
  • 14
  • 29