11

I have this code to split CamelCase by regular expression:

Regex.Replace(input, "(?<=[a-z])([A-Z])", " $1", RegexOptions.Compiled).Trim();

However, it doesn't split this correctly: ShowXYZColours

It produces Show XYZColours instead of Show XYZ Colours

How do I get the desired result?

Sean
  • 14,359
  • 13
  • 74
  • 124

4 Answers4

18

Unicode-aware

(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})

Breakdown:

(?=               # look-ahead: a position followed by...
  \p{Lu}\p{Ll}    #   an uppercase and a lowercase
)                 #
|                 # or
(?<=              # look-behind: a position after...
  \p{Ll}          #   an uppercase
)                 #
(?=               # look-ahead: a position followed by...
  \p{Lu}          #   a lowercase
)                 #

Use with your regex split function.


EDIT: Of course you can replace \p{Lu} with [A-Z] and \p{Ll} with [a-z] if that's what you need or your regex engine does not understand Unicode categories.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
12

.NET DEMO

You can use something like this :

(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])

CODE :

string strRegex = @"(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = @"ShowXYZColours";
string strReplace = @" $1$2";

return myRegex.Replace(strTargetString, strReplace);

OUTPUT :

Show XYZ Colours

Demo and Explanation

Sujith PS
  • 4,776
  • 3
  • 34
  • 61
3

using Tomalak's regex with .NET System.Text.RegularExpressions creates an empty entry in position 0 of the resulting array:

Regex.Split("ShowXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")

{string[4]}
    [0]: ""
    [1]: "Show"
    [2]: "XYZ"
    [3]: "Colors"

It works for caMelCase though (as opposed to PascalCase):

Regex.Split("showXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")

{string[3]}
    [0]: "show"
    [1]: "XYZ"
    [2]: "Colors"
dr. rAI
  • 1,773
  • 1
  • 12
  • 15
1

You can try this :

Regex.Replace(input, "((?<!^)([A-Z][a-z]|(?<=[a-z])[A-Z]))", " $1").Trim();

Example :

Regex.Replace("TheCapitalOfTheUAEIsAbuDhabi", "((?<!^)([A-Z][a-z]|(?<=[a-z])[A-Z]))", " $1").Trim();

Output : The Capital Of The UAE Is Abu Dhabi

Husam Ebish
  • 4,893
  • 2
  • 22
  • 38