PHP - preg_replace_callback for camelCasing

Question

I have the following content

"aa_bb" : "foo"
"pp_Qq" : "bar"
"Xx_yY_zz" : "foobar"

And I want to convert the content on the left side to camelCase

"aaBb" : "foo"
"ppQq" : "bar"
"xxYyZz" : "foobar"

And the code:

// selects the left part
$newString = preg_replace_callback("/\"(.*?)\"(.*?):/", function($matches) {        
    // selects the characters following underscores
    $matches[1] = preg_replace_callback("/_(.?)/", function($matches) {
        //removes the underscore and uppercases the character
        return strtoupper($matches[1]);
    }, $matches[1]);

    // lowercases the first character before returning
    return "\"".lcfirst($matches[1])."\" : ".$matches[2];
}, $string);

Can this code be simplified?

Note: The content will always be a single string.

You can simplify your regex a bit, i.e [`^"([^"]+)"\s*:`](https://regex101.com/r/xYjDHv/1/) — Code Maniac, Oct 03 '19 at 01:27
@user3783243 anywhere outside the quotes. No whitespace between the quotes for sure. — , Oct 03 '19 at 01:45
@user3783243 Sorry, misread your code. Also, the content will not be available as an array. Should have clarified it earlier. The content will be provided as string. — , Oct 03 '19 at 02:02
@user3783243 That seems doable but the after exploding, I will still need to recreate the string. Also your method only works for single underscore. I should update my question to reflect that. — , Oct 03 '19 at 02:23
@user3783243 BTW, thanks for your time. My requirements must be starting to annoy you haha. — , Oct 03 '19 at 02:30
https://stackoverflow.com/q/2791998/2943403 , https://stackoverflow.com/q/31274782/2943403 , https://codereview.stackexchange.com/q/48593/141885 — mickmackusa, Oct 08 '19 at 04:02
Kindly explain where this data is headed (why you are bothering to do this). Are you planning on calling `extract()` on these new keys? Do tell more -- there may be a more direct way of getting where you are going. — mickmackusa, Oct 08 '19 at 07:58
@Daol please dignify my request for clarification regarding your data and task. — mickmackusa, Oct 11 '19 at 11:06
@mickmackusa sorry I didn’t really check the comments. As for your question, I don’t have control over where the data is coming from and where it will go, only thing that I have to deal with is to convert the contents to the required format — , Oct 11 '19 at 11:09
You don't know how it is going to be used? Was this an interview or homework assignment? Is the real data always consisting on simple, one-line string values? or are there fring cases to consider. — mickmackusa, Oct 11 '19 at 11:10
@mickmackusa neither, also the real data is string value. As far as I know there should be no other cases — , Oct 11 '19 at 11:16

Casimir et Hippolyte · Accepted Answer · 2019-10-08T09:39:29.423

First, since you already have a working code you want to improve, consider to post your question in code review instead of stackoverflow next time.

Let's start to improve your original approach:

$result = preg_replace_callback('~"[^"]*"\s*:~', function ($m) {
    return preg_replace_callback('~_+(.?)~', function ($n) {
        return strtoupper($n[1]);
    }, strtolower($m[0]));
}, $str);

pro: patterns are relatively simple and the idea is easy to understand.
cons: nested preg_replace_callback's may hurt the eyes.

After this eyes warm-up exercice, we can try a \G based pattern approach:

$pattern = '~(?|\G(?!^)_([^_"]*)|("(?=[^"]*"\s*:)[^_"]*))~';
$result = preg_replace_callback($pattern, function ($m) {
    return ucfirst(strtolower($m[1]));
}, $str);

pro: the code is shorter, no need to use two preg_replace_callback's.
cons: the pattern is from far more complicated.

notice: When you write a long pattern, nothing forbids to use the free-spacing mode with the x modifier and to put comments:

$pattern = '~
(?| # branch reset group: in which capture groups have the same number
    \G # contigous to the last successful match
    (?!^) # but not at the start of the string    
    _
    ( [^_"]* ) # capture group 1
  |
    ( # capture group 1
        "
        (?=[^"]*"\s*:) # lookahead to check if it is the "key part"
        [^_"]*
    )
)
~x';

Is there compromises between these two extremes, and what is the good one? Two suggestions:

$result = preg_replace_callback('~"[^"]+"\s*:~', function ($m) {
    return array_reduce(explode('_', strtolower($m[0])), function ($c, $i) {
        return $c . ucfirst($i);
    });
}, $str);

pro: minimal use of regex.
cons: needs two callback functions except that this time the second one is called by array_reduce and not by preg_replace_callback.

$result = preg_replace_callback('~["_][^"_]*(?=[^"]*"\s*:)~', function ($m) {
    return ucfirst(strtolower(ltrim($m[0], '_')));
}, $str);

pro: the pattern is relatively simple and the callback function stays simple too. It looks like a good compromise.
cons: the pattern isn't very constrictive (but should suffice for your use case)

pattern description: the pattern looks for a _ or a " and matches following characters that aren't a _ or a ". A lookahead assertion then checks that these characters are inside the key part looking for a closing quote and colon. The match result is always like _aBc or "aBc (underscores are trimmed on the left in the callback function and " stays the same after applying ucfirst).

pattern details:

["_] # one " or _
[^"_]* # zero or more characters that aren't " or _
(?= # open a lookahead assertion (followed with)
    [^"]* # all that isn't a "
    " # a literal "
    \s* # eventual whitespaces
    : # a literal :
) # close the lookahead assertion

There's no good answer and what looks simple or complicated really depends on the reader.

Indeed. A simpler regex should be more maintainable for my use case. And the final method adds a decent balance between the two. If possible, could you add an explanation of the last regex to the answer? — , Oct 08 '19 at 04:17

The fourth bird · Answer 2 · 2019-10-03T12:34:06.233

You might make use of preg_replace_callback in combination with the \G anchor and capturing groups.

(?:"\K([^_\r\n]+)|\G(?!^))(?=[^":\r\n]*")(?=[^:\r\n]*:)_?([a-zA-Z])([^"_\r\n]*)

In parts

(?: Non capturing group
- "\K([^_\r\n]+) Match ", capture group 1 match 1+ times any char except _ or newline
- | Or
- \G(?!^) Assert position at the previous match, not at the start
) Close group
(?=[^":\r\n]*") Positive lookahead, assert "
(?=[^:\r\n]*:) Positive lookahead, assert :
_? Match optional _
([a-zA-Z]) Capture group 2 match a-zA-Z
([^"_\r\n]*) Capture group 3 match 0+ times any char except _ or newline

In the replacement concatenate a combination of strtolower and strtoupper using the 3 capturing groups.

Regex demo

For example

$re = '/(?:"\K([^_\r\n]+)|\G(?!^))(?=[^":\r\n]*")(?=[^:\r\n]*:)_?([a-zA-Z])([^"_\r\n]*)/';
$str = '"aa_bb" : "foo"

"pp_Qq" : "bar"

"Xx_yY_zz" : "foobar"
"Xx_yYyyyyyYyY_zz_a" : "foobar"';

$result =  preg_replace_callback($re, function($matches) {
    return strtolower($matches[1]) . strtoupper($matches[2]) . strtolower($matches[3]);
}, $str);

echo $result;

Output

"aaBb" : "foo"

"ppQq" : "bar"

"xxYyZz" : "foobar"
"xxYyyyyyyyyyZzA" : "foobar"

Php demo

Wow, that's quite the expression. I'll reply back once I check it out later — , Oct 03 '19 at 12:31
Thanks for your effort. But I decided to go with the simpler regex solution in the end for the sake of maintainability. — , Oct 08 '19 at 04:20

PHP - preg_replace_callback for camelCasing

2 Answers2