php's preg_replace() versus(vs.) ord()

Question

What is quicker, for camelCase to underscores; using preg_replace() or using ord() ?

My guess is the method using ord will be quicker, since preg_replace can do much more then needed.

<?php
function __autoload($class_name){
    $name = strtolower(preg_replace('/([a-z])([A-Z])/', '$1_$2', $class_name));
    require_once("some_dir/".$name.".php");
}
?>

OR

<?php
function __autoload($class_name){
// lowercase first letter
$class_name[0] = strtolower($class_name[0]);

$len = strlen($class_name);
for ($i = 0; $i < $len; ++$i) {
    // see if we have an uppercase character and replace
    if (ord($class_name[$i]) > ord('A') && ord($class_name[$i]) < ord('Z')) {
        $class_name[$i] = '_' . strtolower($class_name[$i]);
        // increase length of class and position
        ++$len;
        ++$i;
    }
}

return $class_name;
}
?>

disclaimer -- code examples taken from StackOverflowQuestion 1589468.

edit, after jensgram's array-suggestion and finding array_splice i have come up with the following :

<?php
function __autoload ($string)// actually, function camel2underscore
{
$string  =  str_split($string);
$pos     =  count( $string );
while ( --$pos > 0 )
{
    $lower  =  strtolower( $string[ $pos ] );
    if ( $string[ $pos ] === $lower )
    {
        // assuming most letters will be underscore this should be improvement
        continue;
    }
    unset( $string[ $pos ] );
    array_splice( $string , $pos , 0 , array( '_' , $lower ) );
}
$string  =  implode( '' , $string );
return $string;
}
// $pos could be avoided by using the array key, something i might look into later on.
?>

When i will be testing these methods i will add this one but feel free to tell me your results at anytime ;p

The best way to tell what is quicker is to try it. Have you? — Tomalak, Oct 22 '10 at 09:09
maybe interesting for the people reading the following: http://www.paulferrett.com/2009/php-camel-case-functions/ http://www.webdevblog.info/php/convert-strings-between-camelcasepascalcase-and-underscored-notation/ — imme, Oct 22 '10 at 09:10
bwah, i'll put that on my todo list, this is just a question that could be answered by someone who has already done this and/or is good in the know of profiling/benchmarking. — imme, Oct 22 '10 at 09:12

score 2 · Accepted Answer · answered Oct 22 '10 at 09:11

2

i think (and i'm pretty much sure) that the preg_replace method will be faster - but if you want to know, why dont you do a little benchmark calling both functions 100000 times and measure the time?

answered Oct 22 '10 at 09:11

oezi

51,017
10
98
115

score 1 · Answer 2 · answered Oct 22 '10 at 09:18

1

(Not an answer but too long to be a comment - will CW)

If you're going to compare, you should at least optimize a little on the ord() version.

$len = strlen($class_name);
$ordCurr = null;
$ordA = ord('A');
$ordZ = ord('Z');
for ($i = 0; $i < $len; ++$i) {
    $ordCurr = ord($class_name[$i]);
    // see if we have an uppercase character and replace
    if ($ordCurr >= $ordA && $ordCurr <= $ordZ) {
        $class_name[$i] = '_' . strtolower($class_name[$i]);
        // increase length of class and position
        ++$len;
        ++$i;
    }
}

Also, pushing the name onto a stack (an array) and joining at the end might prove more efficient than string concatenation.

BUT Is this worth the optimization / profiling in the first place?

answered Oct 22 '10 at 09:18

jensgram

31,109
6
81
98

more useful as well, `$class_name[$i] = '_' . strtolower($class_name[$i]);` does not work as expected... – imme Oct 22 '10 at 09:35
@immeëmosol No, at least you should maintain a new string for the result. I didn't really dig into the code - just wanted to illustrate some simple techniques :) – jensgram Oct 22 '10 at 10:13
your array idea got me working on a different implementation. at a certain point i will be testing them for speed( by the way, i don't like comments without linebreaks :s ). i'll put the code in a comment because of that. – imme Oct 22 '10 at 10:53

score 1 · Answer 3 · answered Dec 30 '10 at 20:44

My usecase was slightly different than the OP's, but I think it's still illustrative of the difference between preg_replace and manual string manipulation.

$a = "16 East, 95 Street";

echo "preg: ".test_preg_replace($a)."\n";
echo "ord:  ".test_ord($a)."\n";

$t = microtime(true);
for ($i = 0; $i &lt 100000; $i++) test_preg_replace($a);
echo (microtime(true) - $t)."\n";
$t = microtime(true);
for ($i = 0; $i &lt 100000; $i++) test_ord($a);
echo (microtime(true) - $t)."\n";

function test_preg_replace($s) {
    return preg_replace('/[^a-z0-9_-]/', '-', strtolower($s));
}
function test_ord($s) {
    $a = ord('a');
    $z = ord('z');
    $aa = ord('A');
    $zz = ord('Z');
    $zero = ord('0');
    $nine = ord('9');
    $us = ord('_');
    $ds = ord('-');
    $toret = ''; 
    for ($i = 0, $len = strlen($s); $i < $len; $i++) {
        $c = ord($s[$i]);
        if (($c >= $a && $c &lt;= $z) 
            || ($c >= $zero && $c &lt;= $nine)
            || $c == $us 
            || $c == $ds)
        {   
            $toret .= $s[$i];
        }   
        elseif ($c >= $aa && $c &lt;= $zz)
        {   
            $toret .= chr($c + $a - $aa); // strtolower
        }   
        else
        {   
            $toret .= '-';
        }   
    }   
    return $toret;
}

The results are

0.42064881324768
2.4904868602753

so the preg_replace method is vastly superior. Also, string concatenation is slightly faster than inserting into an array and imploding it.

score 0 · Answer 4 · answered Oct 22 '10 at 09:38

0

If all you want to do is convert camel case to underscores, you can probably write a more efficient function to do so than either ord or preg_replace in less time than it takes to profile them.

answered Oct 22 '10 at 09:38

Sam Dufel

17,560
3
48
51

i hopefully did with the last added method, still will profile them on some terribly dreadful sunny day though. – imme Oct 22 '10 at 11:07

score 0 · Answer 5 · answered Sep 02 '11 at 13:32

I've written a benchmark using the following four functions and I figured out that the one implemented in Magento is the fastest one (it's Test4):

Test1:

/**
 * @see: http://www.paulferrett.com/2009/php-camel-case-functions/
 */
function fromCamelCase_1($str)
{
    $str[0] = strtolower($str[0]);
    return preg_replace('/([A-Z])/e', "'_' . strtolower('\\1')", $str);
}

Test2:

/**
 * @see: http://stackoverflow.com/questions/3995338/phps-preg-replace-versusvs-ord#answer-3995435
 */
function fromCamelCase_2($str)
{
    // lowercase first letter
    $str[0] = strtolower($str[0]);

    $newFieldName = '';
    $len = strlen($str);
    for ($i = 0; $i < $len; ++$i) {
        $ord = ord($str[$i]);
        // see if we have an uppercase character and replace
        if ($ord > 64 && $ord < 91) {
            $newFieldName .= '_';
        }
        $newFieldName .= strtolower($str[$i]);
    }
    return $newFieldName;
}

Test3:

/**
 * @see: http://www.paulferrett.com/2009/php-camel-case-functions/#div-comment-133
 */
function fromCamelCase_3($str) {
    $str[0] = strtolower($str[0]);
    $func = create_function('$c', 'return "_" . strtolower($c[1]);');
    return preg_replace_callback('/([A-Z])/', $func, $str);
}

Test4:

/**
 * @see: http://svn.magentocommerce.com/source/branches/1.6-trunk/lib/Varien/Object.php :: function _underscore($name)
 */
function fromCamelCase_4($name) {
    return strtolower(preg_replace('/(.)([A-Z])/', "$1_$2", $name));
}

Result using the string "getExternalPrefix" 1000 times:

fromCamelCase_1: 0.48158717155457
fromCamelCase_2: 2.3211658000946
fromCamelCase_3: 0.63665509223938
fromCamelCase_4: 0.18188905715942

Result using random strings like "WAytGLPqZltMfHBQXClrjpTYWaEEkyyu" 1000 times:

fromCamelCase_1: 2.3300149440765
fromCamelCase_2: 4.0111720561981
fromCamelCase_3: 2.2800230979919
fromCamelCase_4: 0.18472790718079

Using the test-strings I got a different output - but this should not appear in your system:

original:
MmrcgUmNfCCTOMwwgaPuGegEGHPzvUim

last test:
mmrcg_um_nf_cc_to_mwwga_pu_geg_eg_hpzv_uim

other tests:
mmrcg_um_nf_c_c_t_o_mwwga_pu_geg_e_g_h_pzv_uim

As you can see at the timestamps - the last test has the same time in both tests :)

php's preg_replace() versus(vs.) ord()

5 Answers5