0

I dit a lot of search in regex posts but didn't find a solution for what I'm looking for.

I have the fellow regex   ([a-zA-Z]{6}[a-zA-Z0-9]{2}([a-zA-Z0-9]{3})?)?   to accept these cases :

  • empty string
  • (6 alpha) + (2 alphanumeric)
  • (6 alpha) + (2 alphanumeric) + (3 alpha)

Now, What I'm looking for is to modify my regex to accept extra optional char % anywhere with any number of occurence but with keeping the number MAX only of alpha and alphanumeric in the current regex.

Examples:

  • Empty sting -> correct
  • AABB -> wrong (need exactly 6 alpha + 2 alphanumeric when there is no %)
  • AABB% -> correct
  • AA33% -> wrong (need exactly 6 alpha before numeric)
  • AA%33 -> correct ( % is working as wilcard and avoid the max number)
  • A%3 -> correct
  • AA%33% -> correct
  • %AA33% -> correct
  • %AA3% -> correct
  • AAAAAA33 -> correct
  • AABBCCXX -> correct
  • AABBCC44XXX -> correct
  • AABBCC44XXXE -> wrong (length of alpha not respected)
  • %AABBCC44XXXE -> wrong (length of alpha not respected)
  • %AAB%BCC4%4X%XX% -> correct (because % should be ignored in length, length of alpha and alphanumeric is respected here)

Is it possible to do ?

Samy
  • 121
  • 3
  • 9

2 Answers2

1

The regex below should be close enough.

^(?:(?=.*%)(?![A-Z]{1,5}[0-9])(?:%?[A-Z]){0,6}(?:(?:%?[A-Z0-9]){1,2})?(?:(?:%?[A-Z]){1,3})?%?)$|^(?:[A-Z]{6}(?:[A-Z0-9]{2})(?:[A-Z]{3})?)$|^$

Note how the pipes (| = OR) separate 3 regexes.
One for those with %, then those without % and then the blanks.

Also, the character classes only use the uppercase A-Z.
So to allow also the lowercases, either let regex ignore case, or replace those with A-Za-z.

You can test it here

Shorter alternative:

^(?=.*%)(?![A-Z]{1,5}[0-9])(?!(?:.*?[0-9]){3})(?:%?[A-Z0-9]){1,11}%?$|^(?:[A-Z]{6}[A-Z0-9]{2}(?:[A-Z]{3})?)$|^$
LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • Let me explain why I said that AA33% is wrong but AA%33 is correct because the format of my datas are AAAAAAXX(AAA)? (if we put 'A' for alpha and 'XX' for alphanumeric). the % is a wildcard like in SQL. So if we habe AABCDE33 (AA%33 will get it) but there is no data in this form for example AA33BCDE (it's why AA33% is wrong) – Samy Apr 06 '17 at 13:06
  • Ok, the updated version seems to work for the examples. – LukStorms Apr 06 '17 at 13:23
  • I really appreciate your help. Your regex work for all examples given in the question but it does not cover all the cases. It's not working for these examples AA%33% (AABBCC33 or AABBCC33XXX) %AA33% (BBCCAA33 or BBCCAA33XXX) %AA3% (BBCCAA3Y or BBCCAA35 or BBCCAA35XXX) – Samy Apr 06 '17 at 14:20
  • thanks for your answer. I updated my example. I don't know how to update your regex with a negative lookahead to make it works. – Samy Apr 06 '17 at 15:16
  • Thanks for your answer. The Regex works now for all examples given in the question but it doesn't respect the rule of the second example (need exactly 6 alpha + 2 alphanumeric when there is no %). The regex mutch with AABBCC for example and it shouldn't (works fin with AABB, AABBC and AABBCCE all rejected as expected) – Samy Apr 06 '17 at 15:56
  • Corrected. The 2nd group for those without % was optional, removing the ? fixed it. – LukStorms Apr 06 '17 at 20:35
  • Should the following strings be considered correct? If I understood correctly, they should, but they don’t match your regex. `%12`. Also `%12ghi`. – Ole V.V. Apr 06 '17 at 20:36
  • 1
    @OleV.V. Well, that's a question for Samy. But that would be actually an easy to fix in the regex. Just changing the `{1,6}` to `{0,6}` and it will also match `%12x`. Btw, I do like the creativity of your answer. – LukStorms Apr 06 '17 at 20:45
  • `%2%`? `%2%a%`? @Samy – Ole V.V. Apr 06 '17 at 20:54
  • BTW, from my observations it seems your solution runs noticeably faster than mine when I run my rather extensive unit test. – Ole V.V. Apr 06 '17 at 21:15
  • OleV.V you're right %12, %12ghi, %2%, %2%a% should match but the modification {1,6} to {0,6} and the shortest version is working for all cases. Thank you very much guys. Big thank for your help @LukStorms. I really appreciate. – Samy Apr 07 '17 at 08:55
  • @LukStorms I noticed that the short one (alternative) much with %AA%33AAAAA and it shouldn't. But anyway the big one works very well. It's just to inform anyone looking for the same think as me. Thanks – Samy Apr 07 '17 at 09:07
  • @LukStorms, I found a case which is not working with the big one :-( "AABBCC77BB" this example much and it shoudn't because without % we can have only length = 8 or 11 and "AABBCC77BB".length == 10 – Samy Apr 07 '17 at 09:19
  • As I said, my unit test is extensive. With the latest version of the longer regex all tests pass. – Ole V.V. Apr 07 '17 at 09:30
  • @Samy I thought AABBCC77BB would be wrong? Since it's not 3 letters after the alphanumeric but only 2. The last group now expects 3, but if you replace that {3} by {2,3} or {1,3} it'll also accept that one. – LukStorms Apr 07 '17 at 09:48
  • 1
    @LukStorms Sorry my test was maybe wrong. Yeah "AABBCC77BB" should be wrong and I thought it was correct in my test. It's why I told that the regex should accept just 8 or 11 and not 10. Anyway you're regex is amazing. Thanks again for your help – Samy Apr 07 '17 at 11:23
1

This will surprise some. I am using regular expressions for my solution, but the other way around than in the question.

The input string with the % sign in it is my regex. The percent sign is a wildcard (as in SQL, as you say). So I am going to match the known correct strings against the string with the wildcard. Correct strings include the empty string, AAAAAA33 and AAAAAA33AAA.

Stop, you’re thinking, that won’t work for a couple of reasons. First the letters may be any letters in the English alphabet, not just capital A. And the digits are not only 3. Right you are, so we will have to substitute those. So I am going to change your input string AABB to AAAA, etc.

input.replaceAll("[a-zA-Z]", "A")

We also need to substitute the digits in the same way

replaceAll("[0-9]", "3")

We need to take a bit care with the two alphanumeric characters in the middle. If they are alphabetic in the input, they will still not match the 3 in the correct strings I gave above. Fortunately they are just two, so we can handle this be using more correct model strings. To cover all three cases from the question I am using 9 strings:

static final String[] correctModels = {
    "", "AAAAAAAA", "AAAAAAA3", "AAAAAA3A", "AAAAAA33", 
    "AAAAAAAAAAA", "AAAAAAA3AAA", "AAAAAA3AAAA", "AAAAAA33AAA"
};

Now, if after substituting letters to A and digits to 3 one of these model strings matches the input, the input is correct.

Next, Java regex doesn’t recognize % as a wildcard. So change to .* (the regex pattern for any sequence including the empty sequence):

replaceAll("%", ".*")

We might have used [a-zA-Z0-9]*, but since we have full control over the model strings, we don’t need to.

That’s it, we’re set. No wait, the user can fool us by putting valid regex syntax in the input string. Solution: First thing, check that the input only contains letters, digits and percent signs. This solves it because none of these has any special meaning in regex.

public static boolean matches(String input) {
    // if input contains other chars than letter digits and percent, reject it
    if (! input.matches("[a-zA-Z0-9%]*")) {
        return false;
    }
    input = input.replaceAll("[a-zA-Z]", "A")
            .replaceAll("[0-9]", "3")
            .replaceAll("%", ".*");
    Pattern p = Pattern.compile(input);
    for (String model : correctModels) {
        if (p.matcher(model).matches()) {
            return true;
        }
    }
    return false;
}

I have tested with all the examples in the question. They work as specified. I believe the solution is correct for all possible input.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
  • 1
    I'm using @Pattern(regex="") javax validation in spring and don't want to use a custom Validator with java methode but just regex in annotation. It's why I was looking for a full regex solution. Thanks – Samy Apr 07 '17 at 09:09