7

I am trying to come up with a regex for both Upper and lower Camel case.

Here is what I tried

(([A-Z][a-z0-9]*){2,}|([a-z][A-Z0-9]*){2,})

Trying to match Upper camel case with this - ([A-Z][a-z0-9]){2,} but it is matching other combinations as well. Similar is the case with the second part - ([a-z][A-Z0-9]){2,})

Ankit
  • 201
  • 4
  • 12

3 Answers3

4

This would match upper and lower camel case phrases containing at least one upper case in the word.

Upper Camel Case

[A-Z][a-z0-9]*[A-Z0-9][a-z0-9]+[A-Za-z0-9]*

example:HelloWorld, AQuickBrownFox

Lower Camel Case

[a-z]+[A-Z0-9][a-z0-9]+[A-Za-z0-9]*

example: helloWorld, aQuickBrownFox

JackDev
  • 11,003
  • 12
  • 51
  • 68
  • So, in `UpperCamelCase`, only `UpperCamel` will be matched. Is that intentional? Then, wouldn't you want to use [word boundary anchors](http://www.regular-expressions.info/wordboundaries.html) in order not to match (a substring of) `thisIsNotUpperCamelCase` with your first regex? Also, it should be `[A-Z]`, not `[A-Z0-9]`. Finally, (and the OP didn't mention this) what about non-ASCII variable names? Many languages allow Unicode characters for variable names. – Tim Pietzcker Sep 26 '13 at 08:12
  • 1
    But the lowerCamelCase RegEx also matches strings like `A` or even `ABC`! I think it should rather be `[a-z]+[A-Z0-9][a-z0-9]*`. Note the `+` instead of the `*`. – gehho Sep 26 '13 at 08:21
  • also updated my regular expression to handle more complicated matches. – JackDev Sep 26 '13 at 23:45
1

Lower Camel Case - no digits allowed


    ^[a-z][a-z]*(([A-Z][a-z]+)*[A-Z]?|([a-z]+[A-Z])*|[A-Z])$
    

Test Cases: https://regex101.com/library/4h7A1I

Lower Camel Case - digits allowed


    ^[a-z][a-z0-9]*(([A-Z][a-z0-9]+)*[A-Z]?|([a-z0-9]+[A-Z])*|[A-Z])$

Test Cases: https://regex101.com/library/8nQras

Lower Camel Case - digits allowed - Upto 3 upper case letters

To match more than one upper case letter (eg. deviceID, serialNO, awsVPC, deviceSN) it gets slightly more involved:

 
    ^[a-z][a-z0-9]*(([A-Z]{1,3}[a-z0-9]+)*[A-Z]{0,3}|([a-z0-9]+[A-Z]{1,3})*|[A-Z]{1,3})$

Test Cases: https://regex101.com/library/C2eHyc

Pascal Case - no digits allowed


    ^[A-Z](([a-z]+[A-Z]?)*)$

Test Cases: https://regex101.com/library/sF2jRZ

Pascal Case - digits allowed


    ^[A-Z](([a-z0-9]+[A-Z]?)*)$

Test Cases: https://regex101.com/library/csrkQw

Pascal Case - digits allowed - Upto 3 upper case letters

To match more than one upper case letter (eg. DeviceID, SerialNo, AwsVPC, IOStream, StreamIO, DeviceSN) it gets slightly more involved:


    ^[A-Z](([A-Z]{1,2}[a-z0-9]+)+([A-Z]{1,3}[a-z0-9]+)*[A-Z]{0,3}|([a-z0-9]+[A-Z]{0,3})*|[A-Z]{1,2})$

Test Cases: https://regex101.com/library/TLTXbK

For more details on camel case and pascal case check out this repo.

rouble
  • 16,364
  • 16
  • 107
  • 102
0

For lowerCamelCase you need:

  1. A lowerCaseLetter
  2. at least one (lowerCaseLetter or UpperCaseLetter or numb3r)

So an approriate regex would be

[a-z][a-zA-Z0-9]+

Similarly for UpperCamelCase, you'll have [A-Z][a-zA-Z0-9]+, and if you group those, you get

[a-zA-Z][a-zA-Z0-9]+

Edit: If you strictly require that for a word to be a camel case word, it heeds to have a "hump", where a hump is an uppercase letter or a number, you need:

  1. An upper or a lower case letter, followed by
  2. Other lower case letters (maybe none), followed by
  3. A hump, followed by
  4. Other lower case letters (maybe none),
  5. Maybe followed by another hump(s)

Then your regex is:

[a-zA-Z][a-z]*([A-Z0-9]+[a-z]*)+
trincot
  • 317,000
  • 35
  • 244
  • 286
SWeko
  • 30,434
  • 10
  • 71
  • 106
  • But this will match "aa" as well which is not a camel case, right? – Ankit Sep 26 '13 at 07:31
  • 1
    it is if it's a single word. lowerCamelCase for a variable called lower is `lower`. Try to have a rigorous definition for lower/UpperCamelCase. Then the regex will write itself. – SWeko Sep 26 '13 at 07:32
  • I want to match texts like Upper camel case - "CamelCase1" and "camelCase2" for lower camel case! – Ankit Sep 26 '13 at 07:34
  • Those are indeed matched, both by my regex, and the regex you gave. (http://refiddle.com/gvg) – SWeko Sep 26 '13 at 07:38
  • Shouldn't this regex only match the Camel case and not other words like "aa" etc? But in this case, it is matching words apart from camel cases as well. What do you think? – Ankit Sep 26 '13 at 07:41
  • Ah, sorry. So a word is only camel case when there is an actual "hump", so "lower" and "Upper" are not matches. Is a number considered a hump, i.e. should "lower8", "lower8Upper", "lower8lower", or "lower8UPPER" be matched? – SWeko Sep 26 '13 at 07:46
  • I think there can be. – Ankit Sep 26 '13 at 07:56