9

So i am stuck - I have looked at tons of answers in here, but none seems to resolve my last problem.

Through an API with JSON, I receive an equipment list in a camelcase format. I can not change that.

I need this camelcase to be translated into normal language -

So far i have gotten most words seperated through:

$string = "SomeEquipmentHere";

$spaced = preg_replace('/([A-Z])/', ' $1', $string);
var_dump($spaced);

string ' Some Equipment Here' (length=20)

$trimmed = trim($spaced);
var_dump($trimmed);
string 'Some Equipment Here' (length=19)

Which is working fine - But in some of the equipments consists of abbreviations

"ABSBrakes" - this would require ABS and separated from Brakes

I can't check for several uppercases next to each other since it will then keep ABS and Brakes together - there are more like these, ie: "CDRadio"

So what is want is the output to be:

"ABS Brakes"

Is there a way to format it so, if there is uppercases next to eachother, then only add a space before the last uppercase letter of that sequence?

I am not strong in regex.

EDIT

Both contributions are awesome - people coming here later should read both answers

The last problems to consists are the following patterns :

"ServiceOK" becomes "Service O K"

"ESP" becomes "ES P"

The pattern only consisting of a pure uppercased abbreviation is fixed by a function counting lowercase letter, if there is none, it will skip over the preg_replace().

But as Flying wrote in the comments on his answer, there could potentially be a lot of instances not covered by his regex, and an answer could be impossible - I don't know if this could be a challenge for the regex.

Possibly by adding some "If there is not a lowercase after the uppercase, there should not be inserted a space" rule

Stender
  • 2,446
  • 1
  • 14
  • 22
  • Any serious API should send an identifier (numeric or camelcase object name) and a "display name". Besides your workaround, I would contact the API owner and ask him to put in the missing information. – Daniel W. Nov 22 '17 at 12:32
  • @DanFromGermany This is true... but it took almost a week to get access from them, so I doubt that it will be fixed anytime soon – Stender Nov 22 '17 at 13:41

2 Answers2

4

Here is a single-call pattern that doesn't use any anchors, capture groups, or references in the replacement string: /(?:[a-z]|[A-Z]+)\K(?=[A-Z]|\d+)/

Pattern&Replace Demo

Code: (Demo)

$tests = [
    'SomeEquipmentHere',
    'ABSBrakes',
    'CDRadio',
    'Valve14',
];
foreach ($tests as $test) {
    echo preg_replace('/(?:[a-z]|[A-Z]+)\K(?=[A-Z]|\d+)/',' ',$test),"\n";
}

Output:

Some Equipment Here
ABS Brakes
CD Radio
Valve 14

This is a better method because there is nothing to mop up. If there are new strings to consider (that break my method), please leave them in a comment so that I can update my pattern.

Pattern Explanation:

/         #start the pattern
(?:[a-z]  #match 1 lowercase letter
|         #or
[A-Z]+)   #1 or more uppercase letters
\K        #restart the fullstring match (forget the past)
(?=[A-Z]  #look-ahead for 1 uppercase letter
|         #or
\d+)      #1 or more digits
/         #end the pattern

Edit:

There are some other patterns that may provide better accuracy including:

/(?:[a-z]|\B[A-Z]+)\K(?=[A-Z]\B|\d+)/

Granted, the above pattern will not properly handle ServiceOK

Demo Link Word Boundaries Link


or this pattern with an anchor:

/(?!^)(?=[A-Z][a-z]+|(?<=\D)\d)/

The above pattern will accurately split: SomeEquipmentHere, ABSBrakes, CDRadio, Valve14, ServiceOK, ESP as requested by the OP.

Demo Link

*Note: Pattern accuracy can be improved as more sample strings are provided.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • @Stender This was a fun one. I have provided you with a cleaner method that works on all provided inputs. If you have any new inputs that my pattern doesn't handle properly, please add them to your question and leave me a comment. If you would like me to explain anything further, just ask. – mickmackusa Nov 22 '17 at 12:30
  • This looks very clean! I have a new pattern problem which i don't know if can be fixed - basically : if there is not a lowercase after the uppercase, there should not be inserted a space ('-' ) - I have updated the question – Stender Nov 22 '17 at 13:37
  • Like this: https://regex101.com/r/TNZNC0/2 ? The more strings you give me, the better I can refine the pattern. – mickmackusa Nov 22 '17 at 13:40
  • @mickmacusa I have tested all of the strings that i have gotten from the API so far, against your updated regex - and nothing has broken it - this is very helpfull. it even saved me from using the count-lowercase function that i cooked up! – Stender Nov 22 '17 at 13:50
  • There you go - Flyings answer was good, but this fixed every issue that i could find. – Stender Nov 22 '17 at 13:52
  • I have updated my answer to include a couple of patterns. The last one may be the best one for your project. If you want to write a condition that can skip the `preg_replace()` call, you can use [ctype_upper()](http://php.net/manual/en/function.ctype-upper.php). – mickmackusa Nov 22 '17 at 20:47
  • Some topical discussion in the PHP Internals regarding the "preferred" casing with respect to acronyms... https://marc.info/?l=php-internals&m=169339572707896&w=2 – mickmackusa Aug 30 '23 at 23:11
3

Here is how it can be solved:

$tests = [
    'SomeEquipmentHere',
    'ABSBrakes',
    'CDRadio',
    'Valve14',
];
foreach ($tests as $test) {
    echo trim(preg_replace('/\s+/', ' ', preg_replace('/([A-Z][a-z]+)|([A-Z]+(?=[A-Z]))|(\d+)/', '$1 $2 $3', $test)));
    echo "\n";
}

Related test on regex101.

UPDATE: Added example for additional question

Flying
  • 4,422
  • 2
  • 17
  • 25
  • This is exactly what is have been looking for! You Sir/Mam are awesome - I know that this is not part of the question - but could you add something in the regex like a space before the first number in the string? so something like Valves14 could be spaced as well? – Stender Nov 22 '17 at 09:34
  • @Stender it is a bit different approach, but I've updated answer to provide solution for such strings too – Flying Nov 22 '17 at 09:39
  • Am i reading it wrong, or is test2 now returning "AB SB rakes" in your regex test? – Stender Nov 22 '17 at 09:43
  • 2
    @MartinLyder Of course there can be a lot of different scenarios that are out of scope of this question, like, for example, it is possible to have a need to keep part of the word along with digits or something else. But without having complete list of such scenarios it is unlikely possible to provide solution. That's why I've provided answer as a list of tests and link to regex101. – Flying Nov 22 '17 at 10:11
  • I can confirm that it indeed does this, but I will condition the function to only run if there is one or more lowercase Letters. – Stender Nov 22 '17 at 10:17