0

So i need to check if amount of chars from specific set in a string is higher than some number, what a fastest way to do that?

For example i have a long string "some text & some text & some text + a lot more + a lot more ... etc." and i need to check if there r more than 3 of next symbols: [&,.,+]. So when i encounter 4th occurrence of one of these chars i just need to return false, and stop the loop. So i think to create a simple function like that. But i wonder is there any native method in php to do such a thing? But i need some function which will not waste time parsing the string till the end, cuz the string may be pretty long. So i think regexp and functions like count_chars r not suited for that kind of job...

Any suggestions?

Maxime
  • 8,645
  • 5
  • 50
  • 53
Markus_13
  • 328
  • 2
  • 14

3 Answers3

2

I don't know about a native method, I think count_chars is probably as close as you're going to get. However, rolling a custom solution would be relatively simple:

$str = 'your text here';
$chars = ['&', '.', '+'];
$count = [];
$length = strlen($str);
$limit = 3;
for ($i = 0; $i < $length; $i++) {
    if (in_array($str[$i], $chars)) {
        $count[$str[$i]] += 1;
        if ($count[$str[$i]] > $limit) {
            break;
        }
    }
}

Where the data is actually coming from might also make a difference. For example, if it's from a file then you could take advantage of fread's 2nd parameter to only read x number of bytes at a time within a while loop.

Finding the fastest way might be too broad of a question as PHP has a lot of string related functions; other solutions might use strstr, strpos, etc...

mister martin
  • 6,197
  • 4
  • 30
  • 63
0

Not benchmarked the other solutions but http://php.net/manual/en/function.str-replace.php passing an array of options will be fast. There is an optional parameter which returns the count of replacements. Check that number

 str_replace ( ['&','.','+'], '' , $subject , $count  )

 if ($count > $number ) {
exussum
  • 18,275
  • 8
  • 32
  • 65
-1

Well, all my thoughts were wrong and my expectations were crushed by real tests. RegExp seems to work from 2 to 7 times faster (with different strings) than self-made function with simple symbol-checking loop.

The code:

// self-made function:
function chk_occurs($str,$chrs,$limit){
    $r=false;
    $count = 0;
    $length = strlen($str);
    for($i=0; $i<$length; $i++){
        if(in_array($str[$i], $chrs)){
            $count++;
            if($count>$limit){
                $r=true;
                break;
            }
        }
    }
    return $r;
}

// RegExp i've used for tests:
preg_match('/([&\\.\\+]|[&\\.\\+][^&\\.\\+]+?){3,}?/',$str);

Of course it works faster because it's a single call to native function, but even same code wrapped into function works from 2 to ~4.8 times faster.

//RegExp wrapped into the function:
function chk_occurs_preg($str,$chrs,$limit){
    $chrs=preg_quote($chrs);
    return preg_match('/(['.$chrs.']|['.$chrs.'][^'.$chrs.']+?){'.$limit.',}?/',$str);
}

P.S. i wasn't bothered to check cpu-time, just was testing walltime measured via microtime(true); of the 200k iteration loop, but it's enough for me.

Markus_13
  • 328
  • 2
  • 14