1

I need to validate a string in Java to fulfill the following requirements:

  • string must be 5-32 characters long
  • string can contain
    • letters (a-z),
    • numbers (0-9),
    • dashes (-),
    • underscores (_),
    • and periods (.).
  • string mustn't contain more than one period (.) in a row.

Would this regex be be a correct solution?

^(?!([^\\.]*+\\.){2,})[\\.a-z0-9_-]{5,32}$
Pshemo
  • 122,468
  • 25
  • 185
  • 269
Markus
  • 1,222
  • 1
  • 13
  • 26
  • 1
    Is there a reason you need to do it in one regex? – slim Feb 20 '14 at 15:45
  • Yes, I need it for bean validation. – Markus Feb 20 '14 at 15:46
  • 1
    Hint: you dont have to escape any character inside a `[ ]` except for `]`. – SebastianH Feb 20 '14 at 15:50
  • You can have multiple requirements. I think it would be more readable if you used *Size for the length constraint, one *Pattern to check for acceptable chars, another *Pattern to check for "..". This also allows for different error messages for different violations. See http://stackoverflow.com/questions/16225015/multiple-regex-patterns-for-1-field (By * I mean at-sign -- SO thinks it's addressing a user!) – slim Feb 20 '14 at 15:52
  • Thanks for your hint, Sebastian! – Markus Feb 20 '14 at 15:55
  • @SebastianH Actually sometimes `-` needs to be escaped, or placed at place where it will not be part of character range. But yes, generally what you said is true :) – Pshemo Feb 20 '14 at 15:57
  • In this case you only have one viable restraint, the dot. No need to validate the entire string with a lookahead each time. Its costly and inefficient. Its better to do it inline via an alternation. –  Feb 20 '14 at 16:04
  • @sln it's a maximum of 32 chars long. Readability trumps efficiency here. – slim Feb 20 '14 at 16:08

3 Answers3

1

You're pretty close, You can use this regex to block 2 periods in input:

^(?!([^.]*\\.){2})[.a-z0-9_-]{5,32}$

If you want to block 2 consecutive dots then use:

^(?!.*?\\.{2})[.a-z0-9_-]{5,32}$
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    This allows for uppercase letters, not sure the OP wants that – fge Feb 20 '14 at 15:47
  • 1
    Nice. I havent seen the `(?! )` part of regex before this. – SebastianH Feb 20 '14 at 15:51
  • 2
    `"a.a..a".matches("^(?![^.]*\\.{2})[.a-z0-9_-]{5,32}$")` returns `true` :/ Change `(?![^.]*\\.{2})` to `(?!.*\\.{2})` to let negative-look-ahead match also `a.a` before `..` – Pshemo Feb 20 '14 at 15:54
1

I love regular expressions, but for reasons of readability and maintainability, I think they should be kept simple wherever possible, and that means using them for what they're good at, and using other features of your language/environment where appropriate.

In the comments you say this is for bean validation. You could validate your field with multiple simple annotations:

@Size(min=5,max=32)
@Pattern.List({
    @Pattern(regexp = "^[a-z0-9-_.]*$", 
             message = "Valid characters are a-z, 0-9, -, _, ."),
    @Pattern(regexp = "^((?!\.{2}).)*$", 
             message = "Must not contain a double period")
})
private String myField;

Also bear in mind that you can write custom constraints in Java.

... and of course in other contexts the same applies:

boolean isValid(String s) {
    return s.length() >= 5 &&
           s.length() <= 32 &&
           s.matches("^[a-z0-9-_.]*$") &&
           !s.contains("..")
}
slim
  • 40,215
  • 13
  • 94
  • 127
  • What _readability_ issue are you talking about? Regular expressions are code, format it like code. **[RegexFormat4](http://www.regexformat.com)**. –  Feb 20 '14 at 16:23
0

This might work

 #   "^(?:[0-9a-z_-]|\\.(?!\\.)){5,32}$"  

 ^ 
 (?:
      [0-9a-z_-] 
   |  \.
      (?! \. )
 ){5,32}
 $