7

I want a single Regex expression to match 2 groups of lowercase, uppercase, numbers or special characters. Length needs to also be grater than 7.

I currently have this expression

^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z]).{8,}$

It, however, forces the string to have lowercase and uppercase and digit or special character.

I currently have this implemented using 4 different regex expressions that I interrogate with some C# code.

I plan to reuse the same expression in JavaScript.

This is sample console app that shows the difference between 2 approaches.

class Program
{
    private static readonly Regex[] Regexs = new[] {
        new Regex("[a-z]", RegexOptions.Compiled), //Lowercase Letter
        new Regex("[A-Z]", RegexOptions.Compiled), // Uppercase Letter
        new Regex(@"\d", RegexOptions.Compiled), // Numeric
        new Regex(@"[^a-zA-Z\d\s:]", RegexOptions.Compiled) // Non AlphaNumeric
    };

    static void Main(string[] args)
    {
        Regex expression = new Regex(@"^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z]).{8,}$", RegexOptions.ECMAScript & RegexOptions.Compiled);

        string[] testCases = new[] { "P@ssword", "Password", "P2ssword", "xpo123", "xpo123!", "xpo123!123@@", "Myxpo123!123@@", "Something_Really_Complex123!#43@2*333" };

        Console.WriteLine("{0}\t{1}\t", "Single", "C# Hack");
        Console.WriteLine("");
        foreach (var testCase in testCases)
        {
            Console.WriteLine("{0}\t{2}\t : {1}", expression.IsMatch(testCase), testCase, 
                    (testCase.Length >= 8 && Regexs.Count(x => x.IsMatch(testCase)) >= 2));
        }

        Console.ReadKey();
    }
}

Result  Proper     Test String
------- -------    ------------

True    True     : P@ssword
False   True     : Password
True    True     : P2ssword
False   False    : xpo123
False   False    : xpo123!
False   True     : xpo123!123@@
True    True     : Myxpo123!123@@
True    True     : Something_Really_Complex123!#43@2*333
Eugene S.
  • 3,256
  • 1
  • 25
  • 36
  • Maybe you can describe what result you're trying to achieve? What do you want to match? What result are you getting instead? – jfriend00 Dec 03 '13 at 00:25
  • I added some examples. Alternately, the provided sample code can be used to see the results. – Eugene S. Dec 03 '13 at 00:31

2 Answers2

2

You could use possessive quantifiers (emulated using atomic groups), something like this:

((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}

Since using possessive matching will prevent backtracking, you won't run into the two groups being two consecutive groups of lowercase letters, for instance. So the full regex would be something like:

^(?=.*((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}).{8,}$

Though, were it me, I'd cut the lookahead, just use the expression ((?>[a-z]+)|(?>[A-Z]+)|(?>[^a-zA-Z]+)){2,}, and check the length separately.

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • Do I need to add some escape characters? .NET gives me this error: `parsing "^(?=.*([a-z]++|[A-Z]++|[^a-zA-Z]++){2,}).{8,}$" - Nested quantifier +.` – Eugene S. Dec 03 '13 at 00:36
  • @EugeneS.: .net doesn't have the possessive quantifier feature. However, you can replace a possessive quantifier by an atomic group: `[A-Z]++` -> `(?>[A-Z]+)` – Casimir et Hippolyte Dec 03 '13 at 00:39
  • Right, C# doesn't support possessive quantifiers, sorry. You can [Emulate them with atomic groups](http://stackoverflow.com/questions/5537513/emulating-possessive-quantifiers), I've edited my answer to reflect that instead. – femtoRgon Dec 03 '13 at 00:41
  • That works in .NET. Looks like I'm out of luck using this in JavaScript as it doesn't support atomic groups. – Eugene S. Dec 03 '13 at 00:48
2

For javascript you can use this pattern that looks for boundaries between different character classes:

^(?=.*(?:.\b.|(?i)(?:[a-z]\d|\d[a-z])|[a-z][A-Z]|[A-Z][a-z]))[^:\s]{8,}$

if a boundary is found, you are sure to have two different classes.

pattern details:

\b # is a zero width assertion, it's a boundary between a member of 
   # the \w class and an other character that is not from this class.

.\b. # represents the two characters with the word boundary.

boundary between a letter and a number:

(?i) # make the subpattern case insensitive
(?:
    [a-z]\d # a letter and a digit
  |         # OR
    \d[a-z] # a digit and a letter
)

boundary between an uppercase and a lowercase letter:

[a-z][A-Z] | [A-Z][a-z]

since all alternations contains at least two characters from two different character classes, you are sure to obtain the result you hope.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • That works with both .Net and JavaScript, thanks. Can you explain the logic of what different classes represent. I want to use this as a learning experience. – Eugene S. Dec 03 '13 at 01:22